Portfolio Jobs

Discover opportunities across our network of portfolio companies

Group Engineering Manager

Weka.IO

Weka.IO

Software Engineering
Tel Aviv District, Israel · Israel · Tel Aviv-Yafo, Israel
Posted on Mar 6, 2026
WEKA is transforming how organizations build, run, and scale AI and accelerated compute workflows with NeuralMesh™, our intelligent, adaptive mesh storage system. Unlike traditional data infrastructures, which become more fragile as compute environments grow and performance demands increase, NeuralMesh becomes faster, stronger, and more efficient as it scales, providing a flexible, adaptable foundation for enterprise and agentic AI innovation that maximizes GPU utilization, accelerates time to first token, and lowers the cost of innovation.

WEKA is a growth-stage company backed by world-class venture capital investors and AI infrastructure industry leaders. Our technology, purpose-built for AI, has garnered over 140 patents and is trusted by more than 30% of Fortune 50 enterprises, as well as the world’s leading hyperscalers, neoclouds, and AI innovators. Our team is customer-obsessed and works accountably, boldly, and collaboratively to ensure their success. If we sound like your kind of people, join us!

About The Position WEKA is transforming how organizations build, run, and scale AI and accelerated compute workflows with NeuralMesh™. To ensure our storage mesh is unbreakable, we cannot rely on standard testing methodologies. We need to build a validation engine that is as complex and sophisticated as the product itself.

We are looking for a Group Lead to architect our System Verification & Simulation efforts. This is not a traditional QA role. We are seeking a veteran Software Engineer or Systems Architect to build the "adversarial" distributed system that pushes our platform to its theoretical limits.

Your Mandate Is Twofold

  • Build a validation platform capable of simulating massive-scale AI workloads.
  • Transform the group into an AI-native engineering force. You will leverage Generative AI and LLMs to multiply engineering velocity, automate complex scenario generation, and achieve 1000x the coverage of traditional teams.

What You Will Build & Lead

  • Engineering, Not Scripting: You will lead a team of 20+ engineers who write production-grade code (Python, Go, C++) to create a validation ecosystem. Your team operates with the same rigorous code reviews, design docs, and architectural standards as the Core Kernel team.
  • Lead the AI-Native Transformation: You will pioneer the integration of AI agents and LLMs into the engineering workflow. This means using AI for synthetic workload generation, automated root-cause analysis of crash dumps, and intelligent test code generation, making your team an ultra-productive powerhouse.
  • Architecting "The Breaker": Design and build a massive-scale distributed validation framework. You will focus on orchestrating millions of concurrent IO operations to hunt down race conditions, memory leaks, and network partition behaviors that only appear at extreme scale.
  • Simulation & Chaos: Evolve our infrastructure to simulate real-world entropy. You will champion the development of tools that inject latency, packet loss, and hardware failures into the cluster to prove resilience (Chaos Engineering).
  • Deep-System Telemetry: Move beyond "Pass/Fail." You will build the observability pipelines that track P99 latency, jitter, and recovery times, providing deep feedback loops to the Core team on architectural bottlenecks.

Requirements

  • Engineering DNA: 10+ years of experience in core Software Engineering or Systems Architecture. You have a background in building systems, not just testing them.
  • Scale Leadership: 8+ years of leadership experience, managing managers and groups of 15+ engineers in high-velocity environments.
  • AI-Augmented Engineering: You are an early adopter of AI in the SDLC. You have experience or a strong vision for using LLMs, Copilot, or custom AI agents to accelerate code writing, debug complex failures, and generate test scenarios at scale.
  • Distributed Systems Expertise: You possess deep knowledge of Storage (Object, Block, File), HPC, or Cloud Infrastructure. You understand CAP theorem, consensus algorithms (Raft/Paxos), and the complexities of distributed locking and metadata consistency.
  • Strong Coding Chops: While you are a manager, you understand code. You are proficient in Python for orchestration and can read/debug C++, Rust, or Go to understand the core system's internals.
  • The "SRE" Mindset: You approach quality through the lens of Site Reliability Engineering and System Verification. You prioritize Mean Time to Detection (MTTD) and automated remediation.

Why This Role?

If you are a backend engineer tired of building features and want to tackle the hardest problem in computer science—verifying correctness in a distributed asynchronous system using cutting-edge AI—this is your home. You won't just find bugs; you will build the intelligent technology that proves they don't exist.