Data Engineer
VULCAN ELEMENTS
Software Engineering, Data Science
Research Triangle Park, Durham, NC, USA
Vulcan Elements is manufacturing American rare-earth permanent magnets for a secure, resilient future. With a focus on national security and economic resiliency, we serve critical industries such as defense, aerospace, and automotive, powering a high-technology future. Vulcan Elements is building a team of ambitious professionals committed to Mission Focus, Technical Excellence, and Transparency.
As the Data Engineer, you will design and build the data infrastructure that makes Vulcan’s operational and business data useful — first at pilot scale, and then as the foundation for a 10,000 ton/year facility. You will work from architecture to implementation: evaluating and selecting platforms, designing data models and pipelines, and building the systems that collect, contextualize, and deliver data to the teams and tools that depend on it. You will collaborate closely with cross-functional stakeholders to translate operational requirements into a durable, scalable data architecture. As Vulcan grows, this role has the opportunity to expand into a team leadership position.
Responsibilities
Architecture & Platform Design
- Design and own Vulcan’s data architecture from operational data stores through ETL pipelines to the analytics and AI layer
- Evaluate and select platforms for the data Lakehouse, ETL tooling, and operational databases, weighing scalability, compliance requirements, operational burden, and cost
- Review, refine, and implement data architecture design documents, ensuring designs are technically sound and account for CUI and ITAR data handling requirements
- Make and document key platform and design decisions with enough clarity that future team members can understand the reasoning and build on it
- Ensure the architecture scales from pilot plant to full-scale facility without fundamental redesign
- Apply sound engineering practices to everything you build: version control, testing, observability, and documentation, and hold those standards as the data team grows
Data Pipeline & Integration
- Design and build ETL pipelines that move data from operational data stores into the data Lakehouse with full contextual enrichment, making it ready for analytics and AI workloads
- Build reliable ingest paths for structured data, time-series data, files, images, and other outputs from manufacturing and lab systems
- Collaborate across engineering, operations, and IT to understand data flows, dependencies, and integration requirements, and translate them into pipeline and architecture decisions
- Identify and eliminate manual data workflows, replacing them with monitored, reliable pipelines
- Diagnose and resolve data quality issues across the stack, and build monitoring into pipelines so problems surface early
Data Modeling & Quality
- Define data models that support operational queries, analytical workloads, and future AI and ML applications
- Own data contextualization standards ensuring every data point carries the metadata needed to make it meaningful.
- Contribute to schema design and payload definitions for operational data stores, working toward consistency and legibility across the organization
- Support the development of reporting and visibility tools that give operations and leadership clear insight into process and quality data
- Write clear technical documentation for architecture decisions, data models, pipeline designs, and operational runbooks
Responsibilities and tasks outlined are not exhaustive and may change as determined by the needs of the business.
Qualifications
- 8+ years of experience in data engineering, data infrastructure, or a closely related technical role with a track record of owning and delivering production systems
- Demonstrated experience designing and building data lakes, Lakehouses, or analytical data stores; understands the tradeoffs between platforms and can make and defend platform selection decisions
- Strong experience designing and building ETL/ELT pipelines that enrich and contextualize data
- Deep fluency with data modeling for both operational and analytical workloads; can design schemas that serve present needs without foreclosing future ones
- Experience with relational databases (PostgreSQL, SQL Server, or similar); writes and debugs SQL confidently
- Comfortable working in a fast-moving environment with a small team, making decisions with incomplete information and documenting them clearly for future colleagues
- Strong communicator who can work across technical and non-technical stakeholders and translate between operational requirements and data architecture decisions
- Must be a U.S. Person due to required access to U.S. export-controlled information or facilities
Desired Skills
- Experience with time-series databases (InfluxDB, TimescaleDB, or similar) common in industrial and IoT environments
- Familiarity with industrial data concepts — historian data, process tags, OT/IT integration — and the data challenges specific to manufacturing environments
- Experience working on or alongside a Unified Namespace or MQTT-based data architecture; understands how industrial messaging infrastructure relates to the data layer
- Familiarity with data Lakehouse platforms and open table formats (Delta Lake, Apache Iceberg, or similar)
- Experience with ETL orchestration tooling (Airflow, Prefect, dbt, or similar)
- Comfort with scripting and lightweight development (Python, SQL, or similar) for pipeline development and data quality tooling
- Familiarity with cloud platforms (AWS, Azure, or GCP) and experience evaluating on-premises vs. cloud tradeoffs for data infrastructure
- Experience working in a controlled information environment; familiarity with the handling requirements for Controlled Unclassified Information (CUI) or export-controlled technical data under ITAR or EAR
- Experience in a manufacturing, industrial, or operations-heavy environment