We're hiring an Engineering Managers for our Site Reliability Engineering organization to lead the team that keeps Together AI's production infrastructure running. SRE at Together is roughly 20 engineers organized into three function areas: bare-metal / day-0 / day-2 operations, our inference platform, and our virtual clusters platform. Each function area is led by a technical lead; you'll partner with them to manage and develop the engineers in your timezone.
This is a true player-coach role -roughly 50-60% management and 40-50% hands-on technical work. You'll code, participate in architectural discussions, lead incident response, and stay close enough to the systems to coach effectively. You'll also do the work that makes a team great over time: build trust, develop engineers, hire, and shape the operating rhythms that determine whether the team thrives or burns out. The honest version: the team is talented but currently stretched thin, and a major part of this role is helping them shift from reactive, manual operations to systemic, automation-first work. If that kind of leadership challenge energizes you, this is an unusually high-impact opportunity.
Together's infrastructure isn't a standard cloud setup- we run substantial bare-metal GPU compute, public-cloud Kubernetes for our inference platform, and Kubernetes-with-virtualization for our virtual clusters platform. You don't need to be deep in all of it, but you do need real depth in at least one. The two EM hires will complement each other across these areas, so we're looking at the team profile holistically.
Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey in building the next generation AI infrastructure.
We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $25,0000 - $325,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.
Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.
Please see our privacy policy at https://www.together.ai/privacy
Open-source AI cloud. Fast inference and fine-tuning for open models.
View company profileYou'll be redirected to the company's application page
Get roles like this daily
Join our Telegram channels for curated job alerts
Hey! Looking for your next role in Web3, AI, or Robotics? I can help.
Sign up to save jobs and access them across all your devices.