Senior Product Manager, Agentic Science
Your work will change lives. Including your own. The Impact You’ll Make Recursion is leading an era of autonomous science – iterating across the discovery process, leveraging machine learning and agents built-for-purpose to uncover novel insights across biology and chemistry, fueling our clinical-stage pipeline. As the Senior Product Manager for Agentic Science, you will sit at the intersection of drug discovery and AI, serving as the critical translator between our scientific teams and the agentic systems they depend on. Your primary focus will be on outcomes: ensuring our agents impact Recursion’s pipeline through demonstrably accelerating discovery. This role is not about maintaining a static roadmap; it is about navigating the frontier of a rapidly evolving field. You will partner closely with drug discovery scientists to understand their workflows, translate their needs into agent task definitions, and design and maintain the benchmarks that -ensure our agents are driving value. As the field evolves rapidly, you will help the team distinguish real scientific progress from superficially promising results. In this role, you will: - Champion Benchmarking for Agentic Science: Drive alignment across scientific and technical teams around evaluation frameworks that measure agent performance against scientifically meaningful outcomes, continuously refining them as the field evolves. - Drive Outcome-Focused Product Development: Keep the team anchored to what matters: building agents that meaningfully advance drug discovery programs, not just executing tasks. - Evangelize the "Human-in-the-loop" Evolution: Work with scientific stakeholders to define interfaces where humans review, validate, and shape agent reasoning, ensuring our scientists evolve from "operators" to "architects" of discovery. - Monitor the Competitive Benchmark Landscape: Track how leading organizations across pharma AI, biotech, and foundation model research are measuring agentic performance. Ensure Recursion's evaluation frameworks stay calibrated against external standards, so our benchmarks reflect genuine scientific progress rather than internally optimized metrics. The Team You’ll Join You will join a cross-functional team of software engineers, data scientists, AI/ML scientists and drug discovery biologists and chemists who build the technical bedrock that enables autonomous science, including agent orchestration, guardrails, and the connectivity between our digital and physical assets. You will work closely with the Discovery teams (the users of these agents), the AI Research teams (who build cutting-edge models), and our automated biology and chemistry lab teams (who generate the data that feeds into the models). The Experience You’ll Need - Background in Drug Discovery or AI-driven Science: You have direct experience working in drug discovery, biotech, or AI-driven scientific research. Ideally, you will have worked hands-on with agentic systems (or agents) in a scientific context. You can credibly partner with PhD-level scientists and translate between scientific goals and technical systems without losing fidelity on either side. - Fluency
Internship - Search Machine Learning Engineer
Perplexity is looking for a Search Machine Learning Engineer Intern to help build the next generation of advanced search technologies, with a focus on retrieval and ranking. You will work closely with experienced engineers to improve search quality, experiment with new models, and ship features that directly impact how users search and discover information. Internship program: 12 - 24 weeks, full-time, in-person in the Belgrade office. Responsibilities: - Contribute to experiments that improve search quality through better models, data usage, and evaluation tools, under the guidance of senior engineers. - Design and implement components of the search platform and model stack, including retrieval, ranking, and classification models. - Train evaluating models (including LLM-based approaches) for retrieval, ranking, and classification tasks. - Support deployment and monitoring of search and ranking models in a scalable and performant way. - Help build and iterate on RAG pipelines for grounding and answer generation. - Collaborate with Data, AI, Infrastructure and Product teams to deliver improvements quickly and learn best practices in production ML. Qualifications: - Strong foundation in machine learning and statistics, with coursework or projects related to information retrieval, ranking, or recommender systems. - Experience with Python and common ML frameworks (e.g. PyTorch, TensorFlow, JAX) through academic, open source, or personal projects. - Familiarity with evaluating model quality using offline metrics and/or A/B testing is a plus, but not required. - Previous experience (internships, research, or significant projects) working on search, recommendation, or NLP is a plus, but not required. - Self-driven and curious, with a strong sense of ownership, willingness to learn, and comfort working in a fast-paced environment - Experience with Rust will be a plus
Senior Computational Biologist – Target ID
Your work will change lives. Including your own. The Impact You’ll Make As a computational biology specialist on our Target ID team, you will be at the forefront of transitioning Recursion to its next era of drug discovery. You will serve as a critical biological anchor for a highly technical data science team building the next generation of our target discovery pipelines. The team’s mission is to identify novel therapeutic targets at scale across the genome for hundreds of indications. You will have access to all of Recursion's data layers—including massive internal maps, functional genomics, and rich patient data (transcriptomics, genetics, and Real-World Data/EHR)—and will be tasked with proposing, piloting, and deploying new methods to integrate these datasets. Crucially, you will leverage your biological subject matter expertise to orient and focus your data scientist and software engineer peers towards reliable and correct usage of biological data and feasible candidate programs. As we build out semi-automated and agentic tools (e.g., multi-agent LLM systems for portfolio oversight and target assessment), your deep real-world biology experience will guide the development of these tools, ensuring they are grounded in biological reality and translate to meaningful portfolio and patient impact. The ideal candidate is a computational biology specialist with high fluency in data science tech stacks. You know how to build and evaluate predictive models and build agentic pipelines, and your differentiator is your deep understanding of patient-relevant datasets, disease biology, and target discovery. In this role, you will: - Discover & Evaluate: Propose and validate novel targets using deep integration of patient data (transcriptomics, population genetics, EHR, etc) and Recursion's internal multi-omic data layers. - Guide & Automate: Collaborate closely with data scientists to build, refine, and guide semi-automated and agentic target discovery tools. You will ensure these tools are biologically sound and can scale to evaluate hypotheses across many indications spanning a wide range of therapeutic areas. - Triangulate: Use advanced statistical methods (e.g., causal inference, survival modeling) to establish confident target-to-patient connections and define specific addressable patient populations for early pipeline programs. - Bridge the Gap: Translate complex platform findings into disease-relevant applications, bridging the gap between high-dimensional data science output and actionable drug discovery insights. - Present & Influence: Communicate complex biological rationale and data analyses to decision-makers and cross-functional stakeholders, driving data-backed "go/no-go" decisions in a two-stage target approval process. The Team You’ll Join Our group is a bold, agile, diverse collective of data scientists and computational biologists driving Recursion’s early portfolio strategy. We are focused on aggressively expanding our early-stage pipeline with highly validated, novel therapeutic targets. To achieve this, we prioritize defining specific patient populations and establishing clear, data-backed translational paths from day one of every new program. Because this
Solutions Engineer, Enterprise
Scale plays a vital role in the development of AI applications. Our customer base is growing exponentially, and you will be on the front lines, ensuring that the world's most innovative companies become passionate, lifelong Scale customers. Solutions Engineers partner closely with AEs, Product, and MLEs to lead prospective customers through pre-sales, delivering customized demos and pilots to secure the “technical win”. Solutions Engineers scope customer technical requirements and develop an actionable SOW. They will work closely with the delivery team to help with initial implementation. Solutions Engineers are relentlessly curious about customer needs and pain points. They employ their expert Scale product knowledge and GenAI knowledge to design solutions that best address these needs. Solutions Engineers are strong relationship builders, great project managers, and provide technical expertise. You will: - Partner with Scale AEs on the customer journey, delivering tailored demos and prototypes according to the customer's requirements. - Develop technical domain expertise in Generative AI / large language model applications for Enterprise use cases, including customers in financial services, insurance, SaaS, and similar enterprises. - Be accountable for securing the “technical win” by unblocking technical challenges - Interact with customers daily to understand their needs and design solutions to better serve them. - Design and develop “Scopes of Work” by breaking down customer challenges into a project plan - Work closely with forward-deployed Software and Machine learning Engineers to develop agents in the initial post-sales stage - Work with AEs and PMs to identify customer-specific feature requests. - Drive strategic initiatives to improve the efficiency and effectiveness of the Solution Engineering team. Ideally, you'd have: - Strong engineering background with prior experience working with clients in a pre or post-sales capacity to realize business goals. - Prior experience developing with Python, Java and/or other web development languages. - Experience working in enterprise SaaS, cloud tech, finance, fintech or similar industries in a technical capacity with end-customer engagement. - A track record as a self-starter, motivated to independently unblock technical issues in the field with the customer, away from the mothership. - Presentation skills with a high degree of technical credibility when speaking with executives and front-line engineers. - High level of comfort communicating effectively across internal and external organizations. - Intellectual curiosity, empathy, and ability to operate with high velocity. Nice to haves: - GenAI Experience - Forward deployed engineering experience - Machine Learning Experience Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including job-related skills, experience, qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equ
Senior/Staff Machine Learning Engineer, General Agents, Enterprise GenAI
Scale AI is the data foundation for AI, helping organizations build and deploy reliable production AI applications. We partner with leading enterprises and government organizations to accelerate their AI initiatives through our data annotation platform, generative AI solutions, and enterprise AI capabilities. About the General Agents Team The General Agents team, part of Scale’s Enterprise organization, builds robust general agents for customer use cases and applications. The team sits at the intersection of frontier agent development and real-world deployment, translating state-of-the-art reasoning and agentic capabilities into reliable, production-grade systems that drive real economic value. Our agents are scalable systems built around recurring enterprise problem domains, with a strong emphasis on generalization, extensibility, and deployment across many customers. About the Role As a Senior/Staff Machine Learning Engineer (MLE) on the General Agents team, you’ll play a critical role in designing, building, and deploying production-ready AI agents that solve high-impact enterprise problems. You will work across the full agent lifecycle—from model and system design to evaluation, deployment, and iteration—bridging cutting-edge agentic techniques with the constraints and requirements of real customer environments. You will: - Design and implement end-to-end agent systems that combine LLM reasoning, tool use, memory, and control logic to solve recurring enterprise use cases. - Build scalable, reliable agent architectures that can be deployed across many customers with varying data, tools, and constraints. - Develop evaluation frameworks, datasets, environments, and metrics to measure agent performance, reliability, and business impact in production settings. - Collaborate closely with product managers, customers, data annotators, and other engineering teams to translate enterprise requirements into robust agent designs. - Productionize frontier agent techniques (e.g., planning, multi-step reasoning and tool-use, multi-agent patterns) into maintainable, observable systems. - Own deployment, monitoring, and iteration of agent systems, including failure analysis and continuous improvement based on real-world usage. - Contribute to technical direction and architectural decisions for general agent development best practices and methods, with increasing scope and leadership at the Staff level. Ideally you’d have: - 5+ years of experience building and deploying machine learning or AI systems for real-world, production use cases. - Strong engineering fundamentals, supported by a Bachelor’s and/or Master’s degree in Computer Science, Machine Learning, AI, or equivalent practical experience. - Deep understanding of modern LLMs, prompt-, context-, and system-level optimization, and agentic system design. - Proven proficiency in Python, including writing production-quality, testable, and maintainable code. - Experience building systems that integrate models with external tools, APIs, databases, and services. - Ability to operate in ambiguous problem spaces, balancing research-driven approaches with pragmatic product constraints. - Strong communication skills and comfort working in customer-facing or cross-functional environments. Nice-to-haves: - Hands-on experience building AI agents using modern generative AI stacks (OpenAI APIs, commercial or open-source LLMs). <li&g
ML Infrastructure Engineer, Safeguards
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role We are seeking a Machine Learning Infrastructure Engineer to join our Safeguards organization, where you'll build and scale the critical infrastructure that powers our AI safety systems. You'll work at the intersection of machine learning, large-scale distributed systems, and AI safety, developing the platforms and tools that enable our safeguards to operate reliably at scale. As part of the Safeguards team, you'll design and implement ML infrastructure that powers Claude safety. Your work will directly contribute to making AI systems more trustworthy and aligned with human values, ensuring our models operate safely as they become more capable. Responsibilities: - Design and build scalable ML infrastructure to support real-time and batch classifier and safety evaluations across our model ecosystem - Build monitoring and observability tools to track model performance, data quality, and system health for safety-critical applications - Collaborate with research teams to productionize safety research, translating experimental safety techniques into robust, scalable systems - Optimize inference latency and throughput for real-time safety evaluations while maintaining high reliability standards - Implement automated testing, deployment, and rollback systems for ML models in production safety applications - Partner with Safeguards, Security, and Alignment teams to understand requirements and deliver infrastructure that meets safety and production needs - Contribute to the development of internal tools and frameworks that accelerate safety research and deployment You may be a good fit if you: - Have 5+ years of experience building production ML infrastructure, ideally in safety-critical domains like fraud detection, content moderation, or risk assessment - Are proficient in Python and have experience with ML frameworks like PyTorch, TensorFlow, or JAX - Have hands-on experience with cloud platforms (AWS, GCP) and container orchestration (Kubernetes) - Understand distributed systems principles and have built systems that handle high-throughput, low-latency workloads - Have experience with data engineering tools and building robust data pipelines (e.g., Spark, Airflow, streaming systems) - Are results-oriented, with a bias towards reliability and impact in safety-critical systems - Enjoy collaborating with researchers and translating cutting-edge research into production systems - Care deeply about AI safety and the societal impacts of your work Strong candidates may have experience with: - Working with large language models and modern transformer architectures - Implementing A/B testing frameworks and experimentation infrastructure for ML systems - Developing monitoring and alerting systems for ML model performance and data drift - Building automated labeling systems and human-in-the-loop workflows - Ex
Research Engineer, Knowledge Team
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role: We are looking for Research Engineers to help us redesign how Claude interacts with external data sources. Many of the paradigms for how data and knowledge bases are organized assume human consumers and constraints. This is no longer true in a world of LLMs! Your job will be to design new architectures for how information is organized, and train language models to optimally use those architectures. Responsibilities: - Designing and implementing from scratch new information architecture strategies - Performing finetuning and reinforcement learning to teach language models how to interact with new information architectures - Building “hard” knowledge base eval sets to help identify failure modes of how language models work with external data - Designing and evaluating advanced agentic search capabilities. You may be a good fit if you: - Are a very experienced Python programmer who can quickly produce reliable, high quality code that your teammates love using - Have good machine learning research experience - Have experience developing software that utilizes Large Language Models such as Claude - Are results-oriented, with a bias towards flexibility and impact - Pick up slack, even if it goes outside your job description - Enjoy pair programming (we love to pair!) - Want to partner with world-class ML researchers to develop new LLM capabilities - Care about the societal impacts of your work - Have clear written and verbal communication Strong candidates will also have experience with: - Collaborating with product teams to quickly prototype and deliver innovative solutions - Building complex agentic systems that utilize LLMs - Developing scalable distributed information retrieval systems, such as search engines, knowledge graphs, RAG, indexing, ranking, query understanding, and distributed data processing The annual compensation range for this role is listed below. For sales roles, the range provided is the role’s On Target Earnings ("OTE") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role. Annual Salary: $350,000 — $850,000 USD Logistics Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience Minimum years of experience: Years of expe
Research Engineer, Performance RL
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the RL Teams Our Reinforcement Learning teams lead Anthropic's reinforcement learning research and development, playing a critical role in advancing our AI systems. We've contributed to all Claude models, with significant impacts on the autonomy and coding capabilities of Claude Sonnet 4.6 and Opus 4.6. Our work spans several key areas: - Developing systems that enable models to use computers effectively - Advancing code generation through reinforcement learning - Pioneering fundamental RL research for large language models - Building scalable RL infrastructure and training methodologies - Enhancing model reasoning capabilities We collaborate closely with Anthropic's alignment and frontier red teams to ensure our systems are both capable and safe. We partner with the applied production training team to bring research innovations into deployed models, and are dedicated to implement our research at scale. Our Reinforcement Learning teams sit at the intersection of cutting-edge research and engineering excellence, with a deep commitment to building high-quality, scalable systems that push the boundaries of what AI can accomplish. About the Role We're hiring for the Code RL team within the RL organization. As a Research Engineer, you'll advance our models' ability to safely write correct, fast code for accelerators. You'll need to know accelerator performance well to turn it into tasks and signals models can learn from. Specifically, you will: - Invent, design and implement RL environments and evaluations. - Conduct experiments and shape our research roadmap. - Deliver your work into training runs. - Collaborate with other researchers, engineers, and performance engineering specialists across and outside Anthropic. You may be a good fit if you: - Have expertise with accelerators (CUDA, ROCm, Triton, Pallas), ML framework programming (JAX or PyTorch). - Have worked across the stack – kernels, model code, distributed systems. - Know how to balance research exploration with engineering implementation. - Are passionate about AI's potential and committed to developing safe and beneficial systems. Strong candidates may also have: - Experience with reinforcement learning. - Experience porting ML workloads between different types of accelerators. - Familiarity with LLM training methodologies. The annual compensation range for this role is listed below. For sales roles, the range provided is th
Research Scientist, Agent Robustness
Scale Labs, Research Scientist — Agent Robustness As the leading data and evaluation partner for frontier AI companies, Scale plays an integral role in understanding the capabilities and safeguarding AI models and systems. Building on this expertise, Scale Labs has launched a new team focused on policy research, to bridge the gap between AI research and global policymakers to make informed, scientific decisions about AI risks and capabilities. Our research tackles the hardest problems in agent robustness, AI control protocols, and AI risk evaluations to help governments, industry, and the public understand and mitigate AI risk while maximizing AI adoption. This team collaborates broadly across industry, the public sector, and academia and regularly publishes our findings. We are actively seeking talented researchers to join us in shaping this vision. As a Research Scientist working on Agent Robustness you will work on the fundamental challenges of building AI agents that are safe and aligned with humans. For example, you might: - Research the science of AI agent capabilities with a focus on how they relate to safety, risk factors, and methodologies for benchmarking them; - Design and build harnesses to test AI agents’ tendency to take harmful actions when pressured to do so by users or tricked into doing so by elements of their environment; - Design and build exploits and mitigations for new and unique failure modes that arise as AI agents gain affordances like coding, web browsing, and computer use; - Characterize and design mitigations for potential failure modes or broader risks of systems involving multiple interacting AI agents. Ideally you’d have: - Commitment to our mission of promoting safe, secure, and trustworthy AI deployments in the industry as frontier AI capabilities continue to advance. - Practical experience conducting technical research collaboratively. You should be comfortable building and leveraging agent scaffolding, designing evaluation harnesses, and quickly turning new ideas from the research literature into working prototypes. - Experience with post-training and RL techniques such as RLHF, DPO, GRPO, and similar approaches. - A track record of published research in machine learning, particularly in generative AI. - At least three years of experience addressing sophisticated ML problems, whether in a research setting or in product development. - Strong written and verbal communication skills to operate in a cross-functional team. Nice to have: - Hands-on experience with agent evaluation frameworks such as SWE-bench, WebArena, OSWorld, Inspect, or similar tools. - Experience with red-teaming, prompt injection, or adversarial testing of AI systems. Our research interviews are crafted to assess candidates' skills in practical ML prototyping and debugging, their grasp of research concepts, and their alignment with our organizational culture. We will not ask any LeetCode-style questions. If you’re excited about advancing AI safety and contributing to our mission, we encourage you to apply, even if your experience doesn’t perfectly align with every requirement. Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and m
Research Scientist, Frontier Risk Evaluations
Scale Labs, Research Scientist — Frontier Risk Evaluations As the leading data and evaluation partner for frontier AI companies, Scale plays an integral role in understanding the capabilities and safeguarding AI models and systems. Building on this expertise, Scale Labs has launched a new team focused on policy research, to bridge the gap between AI research and global policymakers to make informed, scientific decisions about AI risks and capabilities. Our research tackles the hardest problems in agent robustness, AI control protocols, and AI risk evaluations to help governments, industry, and the public understand and mitigate AI risk while maximizing AI adoption. This team collaborates broadly across industry, the public sector, and academia and regularly publishes our findings. We are actively seeking talented researchers to join us in shaping this vision. As a Research Scientist focused on Frontier Risk Evaluations, you will design and create evaluation measures, harnesses and datasets for measuring the risks posed by frontier AI systems. For example, you might do any or all of the following: - Design and build harnesses to test AI models and systems (including agents) for dangerous capabilities such as security vulnerability exploitation, CBRN uplift, and other high-risk activities; - Work with government agencies or other labs to collectively scope and design evaluations to measure and mitigate risks posed by advanced AI systems; - Publish evaluation methodologies and write technical reports for policymakers. Ideally you’d have: - Commitment to our mission of promoting safe, secure, and trustworthy AI deployments in the industry as frontier AI capabilities continue to advance. - Practical experience conducting technical research collaboratively. You should be comfortable building and instrumenting ML pipelines, writing evaluation harnesses, and quickly turning new ideas from the research literature into working prototypes. - A track record of published research in machine learning, particularly in generative AI. - At least three years of experience addressing sophisticated ML problems, whether in a research setting or in product development. - Strong written and verbal communication skills to operate in a cross-functional team. Nice to have: - Experience in crafting evaluations and benchmarks, or a background in data science roles related to LLM technologies. - Experience with red-teaming or adversarial testing of AI systems. - Familiarity with AI safety policy frameworks (e.g., NIST AI RMF, EU AI Act, Korea AI Basic Act). Our research interviews are crafted to assess candidates' skills in practical ML prototyping and debugging, their grasp of research concepts, and their alignment with our organizational culture. We will not ask any LeetCode-style questions. If you’re excited about advancing AI safety and contributing to our mission, we encourage you to apply, even if your experience doesn’t perfectly align with every requirement. Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including jo
Senior Staff Frontier Agents Engineer
About Scale AI Scale AI is the data foundation for AI, helping organizations build and deploy reliable production AI applications. We partner with leading enterprises and government organizations to accelerate their AI initiatives through our data annotation platform, generative AI solutions, and enterprise AI capabilities. Role Overview As a Senior Staff Forward Deployed AI Engineer on our Enterprise team, you'll be the technical bridge between Scale AI's cutting-edge AI capabilities and our most strategic customers. You'll work with enterprise clients to understand their unique challenges, architect custom AI solutions, and ensure successful deployment and adoption of AI systems in production environments. This is a hands-on technical role that combines deep engineering expertise with customer-facing problem solving. You'll work directly with customer engineering teams to integrate AI into their critical workflows. Key Responsibilities Customer Integration & Deployment - Partner directly with enterprise customers to understand their technical infrastructure, data pipelines, and business requirements - Design and implement custom integrations between Scale AI's platform and customer data environments (cloud platforms, data warehouses, internal APIs) - Build robust data connectors and ETL pipelines to ingest, process, and prepare customer data for AI workflows - Deploy and configure AI models and agents within customer security and compliance boundaries AI Agent Development - Develop production-grade AI agents tailored to customer use cases across domains like customer support, data analysis, content generation, and workflow automation - Architect multi-agent systems that orchestrate between different models, tools, and data sources - Implement evaluation frameworks to measure agent performance and iterate toward business objectives - Design human-in-the-loop workflows and feedback mechanisms for continuous agent improvement Prompt Engineering & Optimization - Create sophisticated prompt engineering strategies optimized for customer-specific domains and data - Build and maintain prompt libraries, templates, and best practices for customer use cases - Conduct systematic prompt experimentation and A/B testing to improve model outputs - Implement RAG (Retrieval Augmented Generation) systems and fine-tuning pipelines where appropriate Technical Leadership & Collaboration - Serve as the primary technical point of contact for strategic enterprise accounts - Collaborate with customer data scientists, ML engineers, and software developers to ensure smooth integration - Provide technical training and knowledge transfer to customer teams - Work closely with Scale's product and engineering teams to translate customer needs into product improvements - Document technical architectures, integration patterns, and best practices Problem Solving & Innovation - Debug complex technical issues across the entire stack, from data pipelines to model outputs - Rapidly prototype solutions to unblock customers and prove out new use cases</li
Research Engineer / Research Scientist, Pre-training
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the team We are seeking passionate Research Scientists and Engineers to join our growing Pre-training team in Zurich. We are involved in developing the next generation of large language models. The team primarily focuses on multimodal capabilities: giving LLMs the ability to understand and interact with modalities other than text. In this role, you will work at the intersection of cutting-edge research and practical engineering, contributing to the development of safe, steerable, and trustworthy AI systems. Responsibilities In this role you will interact with many parts of the engineering and research stacks. - Conduct research and implement solutions in areas such as model architecture, algorithms, data processing, and optimizer development - Independently lead small research projects while collaborating with team members on larger initiatives - Design, run, and analyze scientific experiments to advance our understanding of large language models - Optimize and scale our training infrastructure to improve efficiency and reliability - Develop and improve dev tooling to enhance team productivity - Contribute to the entire stack, from low-level optimizations to high-level model design Qualifications & Experience We encourage you to apply even if you do not believe you meet every single criterion. Because we focus on so many areas, the team is looking for both experienced engineers and strong researchers, and encourage anyone along the researcher/engineer spectrum to apply. - Degree (BA required, MS or PhD preferred) in Computer Science, Machine Learning, or a related field - Strong software engineering skills with a proven track record of building complex systems - Expertise in Python and deep learning frameworks - Have worked on high-performance, large-scale ML systems, particularly in the context of language modeling - Familiarity with ML Accelerators, Kubernetes, and large-scale data processing - Strong problem-solving skills and a results-oriented mindset - Excellent communication skills and ability to work in a collaborative environment You'll thrive in this role if you - Have significant software engineering experience - Are able to balance research goals with practical engineering constraints - Are happy to take on tasks outside your job description to support the team - Enjoy pair programming and collaborative work - Are eager to learn more about machine learning research &l
Research Lead, Training Insights
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role As a Research Lead on the Training Insights team, you'll develop the strategy for, and lead execution on, how we measure and characterize model capabilities across training and deployment. This is a hands-on leadership role: you'll drive original research into new evaluation methodologies while leading a small team of researchers and research engineers doing the same. Your work will span the full lifecycle of model development. You'll research and build new long-horizon evaluations that test the boundaries of what our models can achieve, develop novel approaches to measuring emerging capabilities, and deepen our understanding of how those capabilities develop — both during production RL training and after. You'll also take a cross-organizational view, working across Reinforcement Learning, Pretraining, Inference, Product, Alignment, Safeguards, and other teams to map the landscape of model evaluations at Anthropic and identify critical gaps in coverage. This role carries significant visibility and impact. You'll help shape the evaluation narrative for model releases, contributing directly to how Anthropic communicates about its models to both internal and external audiences. Done well, you will change how the industry measures and understands model capabilities, significantly furthering our safety mission. Responsibilities: - Build new novel and long-horizon evaluations - Develop novel measurement approaches for understanding how model capabilities emerge and evolve during RL training - Lead strategic evaluation coverage across the company - Shape the evaluation narrative for model releases - Lead and mentor a small team of researchers and research engineers, setting research direction and fostering a culture of rigorous, creative research - Design evaluation frameworks that balance scientific rigor with the practical demands of production training schedules - Build and maintain relationships across Anthropic's research organization to ensure evaluation insights inform training and deployment decisions - Contribute to the broader research community through publications, open-source contributions, or external engagement on evaluation best practices You may be a good fit if you: - Have significant experience designing and running evaluations for large language models or similar complex ML systems - Have led technical projects or teams, either formally or through sustained ownership of critical research directions - Are equally comfortable designing experiments and writing code—you can move between research and implementation fluidly - Think strategically about what to measure and why, not just how to measure it - Can synthesize information across multiple teams and workstreams to form a coherent picture of model capabilities - Communicate complex technical findings clearly to both technical and non-technical audiences - Are results-oriented and thrive in fast-paced environments where priorities shift based on research findings <l
Member of Technical Staff (Machine Learning Research Engineer)
Perplexity is seeking an experienced Machine Learning Research Engineer to help build the next generation of advanced search technologies, with a focus on retrieval and ranking. Responsibilities - Relentlessly push search quality forward — through models, data, tools, or any other leverage available - Architect and build core components of the search platform and model stack - Design, train, and optimize large-scale deep learning models using frameworks like PyTorch, leveraging distributed training (e.g., PyTorch Distributed, DeepSpeed, FSDP) and hardware acceleration, with a focus on retrieval and ranking models - Conduct advanced research in representation learning, including contrastive learning, multilingual, and multimodal modeling for search and retrieval - Deploy models — from boosting algorithms to LLMs — in a scalable and performant way - Build and optimize RAG pipelines for grounding and answer generation - Collaborate with Data, AI, Infrastructure, and Product teams to ensure fast and high-quality delivery Qualifications - Deep understanding of search and retrieval systems, including quality evaluation principles and metrics - Proven track record with large-scale search or recommender systems - Strong proficiency with PyTorch, including experience in distributed training techniques and performance optimization for large models - Expertise in representation learning, including contrastive learning and embedding space alignment for multilingual and multimodal applications - Strong publication record in AI/ML conferences or workshops (e.g., NeurIPS, ICML, ICLR, ACL, CVPR, SIGIR) - Self-driven, with a strong sense of ownership and execution - Minimum of 3 years (preferably 5+) working on search, recommender systems, or closely related research areas
Anthropic Fellows Program
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. Apply using this link . We are accepting applications on a rolling basis for the next cohort of Anthropic Fellows, which is expected to start in late September. In some circumstances, we can accommodate fellows starting outside the usual cohort timelines — please note in your application if the September start date doesn't work for you. Anthropic Fellows Program overview The Anthropic Fellows Program is designed to foster AI research and engineering talent. We provide funding and mentorship to promising technical talent - regardless of previous experience. Fellows will primarily use external infrastructure (e.g. open-source models, public APIs) to work on an empirical project aligned with our research priorities, with the goal of producing a public output (e.g. a paper submission). In one of our earlier cohorts, over 80% of fellows produced papers. We run multiple cohorts of Fellows each year and review applications on a rolling basis. This application is for cohorts starting in July 2026 and beyond. What to expect - 4 months of full-time research - Direct mentorship from Anthropic researchers - Access to a shared workspace (in either Berkeley, California or London, UK) - Connection to the broader AI safety and security research community - Weekly stipend of 3,850 USD / 2,310 GBP / 4,300 CAD + benefits (these vary by country) - Funding for compute (~$15k/month) and other research expenses Interview process The interview process will include an initial application & reference check, technical assessments & interviews, and a research discussion. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. Compensation The expected base stipend for this role is 3,850 USD / 2,310 GBP / 4,300 CAD per week, with an expectation of 40 hours per week for 4 months (with possible extension). Fellows workstreams Due to the success of the Anthropic Fellows for AI Safety Research program, we are now expanding it across teams at Anthropic. We expect there to be significant overlap in the types of skills and responsibilities across the roles and will by default consider candidates for all the workstreams
Frontier Agent Engineering Manager, Enterprise
About Scale AI Scale AI is the data foundation for AI, helping organizations build and deploy reliable production AI applications. We partner with leading enterprises and government organizations to accelerate their AI initiatives through our data annotation platform, generative AI solutions, and enterprise AI capabilities. Role Overview As a Forward Deployed AI Engineering Manager on our Enterprise team, you'll be the technical bridge between Scale AI's cutting-edge AI capabilities and our most strategic customers. You'll work with enterprise clients to understand their unique challenges, lead a team that architects specific AI solutions, and ensure successful deployment and adoption of AI systems in production environments. This is a Management role that combines deep engineering and AI expertise, leading a team, and working on customer-facing problems. You'll work directly with customer engineering teams to integrate AI into their critical workflows. Key Responsibilities Customer Integration & Deployment - Partner directly with enterprise customers to understand their technical infrastructure, data pipelines, and business requirements - Design and implement custom integrations between Scale AI's platform and customer data environments (cloud platforms, data warehouses, internal APIs) - Build robust data connectors and ETL pipelines to ingest, process, and prepare customer data for AI workflows - Deploy and configure AI models and agents within customer security and compliance boundaries AI Agent Development - Develop production-grade AI agents tailored to customer use cases across domains like customer support, data analysis, content generation, and workflow automation - Architect multi-agent systems that orchestrate between different models, tools, and data sources - Implement evaluation frameworks to measure agent performance and iterate toward business objectives - Design human-in-the-loop workflows and feedback mechanisms for continuous agent improvement Prompt Engineering & Optimization - Create sophisticated prompt engineering strategies optimized for customer-specific domains and data - Build and maintain prompt libraries, templates, and best practices for customer use cases - Conduct systematic prompt experimentation and A/B testing to improve model outputs - Implement RAG (Retrieval Augmented Generation) systems and fine-tuning pipelines where appropriate Leadership & Collaboration - Serve as the Engineering Manager and technical point of contact for strategic enterprise accounts - Lead a team that is collaborating with customer data scientists, ML engineers, and software developers to ensure smooth integration - Work closely with Scale's product and engineering teams to translate customer needs into product improvements - Document technical architectures, integration patterns, and best practices Problem Solving & Innovation - Debug complex technical issues across the entire stack, from data pipelines to model outputs - Rapidly prototype solutions to unblock customers and prove out new use cases - Stay curr
ML/Research Engineer, Safeguards
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role We are looking for ML Engineers and Research Engineers to help detect and mitigate misuse of our AI systems. As a member of the Safeguards ML team, you will build systems that identify harmful use—from individual policy violations to sophisticated, coordinated attacks—and develop defenses that keep our products safe as capabilities advance. You will also work on systems that protect user wellbeing and ensure our models behave appropriately across a wide range of contexts. This work feeds directly into Anthropic's Responsible Scaling Policy commitments. Responsibilities - Develop classifiers to detect misuse and anomalous behavior at scale. This includes developing synthetic data pipelines for training classifiers and methods to automatically source representative evaluations to iterate on - Build systems to monitor for harms that span multiple exchanges, such as coordinated cyber attacks and influence operations, and develop new methods for aggregating and analyzing signals across contexts - Evaluate and improve the safety of agentic products—developing both threat models and environments to test for agentic risks, and developing and deploying mitigations for prompt injection attacks - Conduct research on automated red-teaming, adversarial robustness, and other research that helps test for or find misuse You may be a good fit if you - Have 4+ years of experience in ML engineering, research engineering, or applied research, in academia or industry - Have proficiency in Python and experience building ML systems - Are comfortable working across the research-to-deployment pipeline, from exploratory experiments to production systems - Are worried about misuse risks of AI systems, and want to work to mitigate them - Have strong communication skills and ability to explain complex technical concepts to non-technical stakeholders Strong candidates may also have experience with - Language modeling and transformers - Building classifiers, anomaly detection systems, or behavioral ML - Adversarial machine learning or red-teaming - Interpretability or probes - Reinforcement learning - High-performance, large-scale ML systems The annual compensation range for this role is listed below. For sales roles, the range provided is the role’s On Target Earnings ("OTE") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role. Annual Salary: $350,000 — $500,000 USD Logistics Minimum education: Bac
Senior Software Engineer, Full-Stack – Scale GP
Scale GP (Scale Generative AI Platform) is an enterprise-grade Generative AI platform providing APIs for knowledge retrieval, inference, evaluation, and more. We are seeking a strong Senior Full-Stack Engineer to help us build, scale, and refine our rapidly growing product. The ideal candidate is deeply grounded in software engineering best practices and experienced in developing and scaling modern web applications end-to-end. You will work across the stack—from React/TypeScript frontends to Python-based backends—while integrating with LLMs and machine learning systems. You will solve complex challenges in scalability, reliability, and product experience while owning significant product areas in a fast-paced environment. What You’ll Do - Own major full-stack product areas , driving features from design through production deployment. - Build modern frontend experiences using React and TypeScript, ensuring performance, usability, and responsiveness. - Develop reliable backend services in Python, working with distributed systems, data pipelines, and ML/LLM components. - Integrate with LLMs, vector databases, and AI infrastructure to power intelligent product experiences. - Deliver experiments and new features quickly , maintaining high quality and tight feedback loops with customers. - Collaborate across product, ML, and infrastructure teams to shape the direction of Scale GP. - Adapt quickly —learning new technologies, frameworks, and tools as needed across the stack. Ideal Experience - 5+ years of full-time engineering experience , post-graduation. - Strong experience developing full-stack applications using React, TypeScript, and Python . - Experience scaling or shipping products at high-growth startups . - Familiarity with LLMs, vector databases, embeddings, or other modern AI tooling (tinkering or production experience welcome). - Proficiency with SQL and modern API development. - Experience with Kubernetes , containerization, and microservice architectures. - Experience working with at least one major cloud provider (AWS, GCP, or Azure). Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including job-related skills, experience, qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range f
Machine Learning Research Engineer, GenAI Applied ML
About This Role Lead applied ML engineering on Scale's Applied ML team, powering data infrastructure for leading agentic LLMs (ChatGPT, Gemini, Llama). You will build scalable multi-agent systems to validate agentic reasoning and behaviors, scale human expertise, and drive research into real-world agent reliability failures despite strong benchmarks, shipping production fixes. Ideal for exceptional engineers with deep research rigor and a relentless focus on practical, high-impact systems. You will iterate rapidly with data, leverage AI tools to accelerate development, and collaborate tightly across engineering, product, and research. If you excel at turning frontier agent research into reliable deployed systems, we want to hear from you. You will: - Build and deploy multi-agent systems for agentic reasoning validation - Develop pipelines to detect errors and scale human judgment - Combine classical ML, LLMs, and multi-agent techniques for reliability - Lead research into agent failure modes and ship fixes - Use AI tools to speed prototyping and iteration - Build data-driven evaluations and deploy rapid improvements - Integrate systems into Scale's platform Ideally You’ll Have: - PhD or MSc in Computer Science, Mathematics, Statistics, or related field - 3+ years shipping scaled production ML systems - Demonstrated real-world impact - Mastery of PyTorch, TensorFlow, JAX, or scikit-learn - Deep expertise in agentic LLMs and multi-agent systems - Strong software engineering and microservices (AWS/GCP) - Rapid, data-driven iteration - Proficiency using AI tools to accelerate work - Strong research depth with practical bias - Excellent cross-functional communication Nice to Have: - Experience prototyping agent evaluation/reliability systems - Human-in-the-loop or annotation pipeline work - Open-source contributions in agents, evaluation, or alignment - Publications on agent reliability (NeurIPS, ICML, ICLR) Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including job-related skills, experience, qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You'll also receive benefits including, but not limited to: comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend. Please reference the job posting's subtitle for where this position will be located. For pay transparency purposes, th
Data Science Manager, Integrity
ABOUT THE TEAM Integrity Data Science sits at the center of OpenAI’s mission to deploy powerful AI responsibly. We help ensure people can trust our products by building measurement systems, experimentation practices, and detection/mitigation strategies that protect OpenAI and our users from misuse, fraud, and evolving adversarial behaviors. As the scope and urgency of Integrity work expands across product surfaces and go-to-market motion, we’re hiring a dedicated Data Science Manager to scale the team, strengthen execution across multiple Integrity domains, and deepen partnership with Product, Engineering, Operations, and adjacent orgs (e.g., Growth, Ads). This role is based in our San Francisco HQ (in-office). ABOUT THE ROLE As Data Science Manager, Integrity, you will lead a team of data scientists working across trust & safety, fraud prevention, risk analysis, measurement, and modeling. You’ll be accountable for building a high-performing DS function that can keep pace with fast-moving threats—and for shaping the analytical strategy that informs how OpenAI detects, measures, and mitigates integrity risks at scale. This is a highly cross-functional leadership role. You’ll help set the roadmap with Integrity Product/Engineering leaders, evolve team structure and operating rhythms, raise the bar on technical rigor (experimentation, causal inference, modeling, metrics), and develop a culture of proactive, high-leverage impact. Many of the challenges in this space are emergent—new misuse patterns appear as the technology and ecosystem evolves—so this role requires strong judgment, comfort with ambiguity, and an ability to build systems that scale. IN THIS ROLE, YOU WILL: - Lead and scale a high-impact Integrity Data Science team—hiring, coaching, and developing DS ICs (and potentially future managers) while setting a strong technical and cultural bar. - Drive strategy across multiple Integrity domains (policy enforcement, bot detection, fraud prevention, IP theft, risk measurement, abuse prevention), balancing near-term response with durable systems. - Build and institutionalize analytical rigor: clear metric frameworks, experimentation standards, monitoring/alerting, and repeatable evaluation approaches for Integrity interventions. - Partner deeply with Product & Engineering to shape roadmaps, prioritize the right bets, and translate ambiguous risk signals into practical product and platform decisions. - Evolve team structure and operating model as the org scales—defining ownership boundaries, improving processes, and creating leverage through better tooling and AI-assisted workflows. - Enable cross-org outcomes, supporting partners outside Integrity (e.g., Growth, Ads, GTM) where integrity risks intersect with product and business goals. - Communicate clearly with senior leadership, synthesizing complex tradeoffs, surfacing risk, and driving alignment on priorities and success metrics. - Push the team toward an AI-leveraged operating mode, using modern tooling and model capabilities to accelerate detection, triage, analysis, and iteration. YOU MIGHT THRIVE IN THIS ROLE IF YOU: - Have deep experience leading and scaling Data Science teams, ideally in trust & safety, fraud/abuse, security, risk, or other adversarial problem spaces in fast-moving environments. - Bring strong technical grounding across modern DS techniques (experimentation, causal inference, anomaly detection, risk modeling, measurement design) and can coach others to execute with rigor. - Have a track record of building durable partnerships across DS, Engineering, Product, and Operations—able to influence without authority and create shared accountability. - Are excellent at hiring, mentoring, and developing technical talent, and can build a culture that is both high-bar and supportive. - Can translate messy, evolving threats into clear frameworks, metrics, and decisions—and keep the team focused on the highest-leverage work. - Are comfortable operating in ambigu
Data Scientist, Codex
ABOUT THE TEAM Codex is OpenAI’s first-party developer product focused on agentic software engineering. We’re building tools that help engineers design, write, test, and ship code faster—safely and at scale. We partner tightly with research and product to translate model advances into tangible developer productivity. ABOUT THE ROLE As a Data Scientist on Codex, you will measure and accelerate product-market fit for AI developer tools. You’ll define what “developer productivity” means for our product, run experiments on new coding models and UX, and pinpoint where the model helps or hurts across languages and tasks. Your insights will directly shape how an entire industry builds software. This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees. IN THIS ROLE, YOU WILL - Embed with the Codex product team to discover opportunities that improve developer outcomes and growth - Design and interpret A/B tests and staged rollouts of new coding models and product features - Define and operationalize metrics such as suggestion acceptance, edit distance, compile/test pass rates, task completion, latency, and session productivity - Build dashboards and analyses that help the team self-serve answers to product questions (by language, framework, repo size, task type) - Diagnose failure modes and partner with Research on targeted improvements (model quality signals, user feedback, evals) YOU MIGHT THRIVE IN THIS ROLE IF YOU HAVE - 5+ years in a quantitative role at a developer-facing or high-growth product - Fluency in SQL and Python; comfort with experiment design and causal inference - Experience defining product metrics tied to user value - Ability to communicate clearly with PM, Eng, and Design—and to influence product direction YOU COULD BE AN ESPECIALLY GREAT FIT IF YOU HAVE - Strong programming background; ability to prototype, run simulations, and reason about code quality - Familiarity with IDE/extension telemetry or developer tooling analytics - Prior experience with NLP/LLMs, code models, or evaluations for generative coding About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement https://cdn.openai.com/policies/eeo-policy-statement.pdf. Background checks for applicants will be administered in accordance with applicable law, and qualified applicants with arrest or conviction records will be considered for employment consistent with those laws, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act, for US-based candidates. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary,
Research Engineer, Interpretability
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role: When you see what modern language models are capable of, do you wonder, "How do these things work? How can we trust them?" The Interpretability team at Anthropic is working to reverse-engineer how trained models work because we believe that a mechanistic understanding is the most robust way to make advanced systems safe. Think of us as doing "neuroscience" of neural networks using "microscopes" we build - or reverse-engineering neural networks like binary programs. More resources to learn about our work: - Our research blog - covering advances including Monosemantic Features and Circuits - An Introduction to Interpretability from our research lead, Chris Olah - The Urgency of Interpretability from CEO Dario Amodei - Engineering Challenges Scaling Interpretability - directly relevant to this role - 60 Minutes segment - Around 8:07, see a demo of tooling our team built - New Yorker article - what it's like to work on one of AI's hardest open problems Even if you haven’t worked on interpretability before, the infrastructure expertise is similar to what's needed across the lifecycle of a production language model: - Pretraining: Training dictionary learning models looks a lot like model pretraining - creating stable, performant training jobs for massively parameterized models across thousands of chips - Inference: Interp runs a customized inference stack. Day-to-day analysis requires services that allow editing a model's internal activations mid-forward-pass - for example, adding a "steering vector" - Performance: Like all LLM work, we push up against the limits of hardware and software. Rather than squeezing the last 0.1%, we are focused on finding bottlenecks, fixing them and moving ahead given rapidly evolving research and safety mission The science keeps scaling - and it's now applied directly in safety audits on frontier models, with real deadlines. As our research has matured, engineering and infrastructure have become a bottleneck. Your work will have a direct impact on one of the most important open problems in AI. Responsibilities: - Build and maintain the specialized inference and training infrastructure that powers interpretability research - including instrumented forward/backward passes, activation extraction, and steering vector a
Applied AI Engineer, Global Public Sector
Scale’s rapidly growing Global Public Sector team is focused on using AI to address critical challenges facing the public sector around the world. Our core work consists of: - Creating custom AI applications that will impact millions of citizens - Generating high-quality training data for national LLMs - Upskilling and advisory services to spread the impact of AI We are hiring Applied AI Engineers to build custom end-to-end AI applications for our public sector clients using the latest developments in the field of AI. You will also get the opportunity to develop and be part of creating custom datasets, evaluations, and fine-tuning these sophisticated models to maximize performance and apply on real world use cases with global reach. At Scale, we’re not just building AI solutions—we are building repeatable blocks to enable the public sector to transform their operations and better serve citizens through cutting-edge technology. If you’re ready to shape the future of AI in the public sector and be a member of our rapidly expanding team, we’d love to hear from you. You will: - Partner with public sector clients to deeply understand their challenges and define AI-driven solutions - Build and deploy end-to-end AI applications into production leveraging latest developments from the biggest AI labs, and open source models - Collaborate with cross-functional teams, including data annotation specialists, to create high-quality training datasets - Design and maintain robust evaluation frameworks to ensure the reliability and effectiveness of AI models - Participate in customer engagements, including occasional travel (approximately two weeks per quarter) - Contribute to the scaling of AI capabilities in the public sector through hands-on knowledge sharing Ideally you’d have: - A strong engineering background, with a Bachelor’s degree in Computer Science, Mathematics, or a related quantitative field (or equivalent practical experience) - 7+ years of post-graduation engineering experience, with demonstrated proficiency in languages such as Python, TypeScript/JavaScript, Java, or C++. - 2+ years of experience applying AI/ML in production environments, such as deploying deep learning solutions, building generative/agentic AI applications or setting up evaluations pipelines - Familiarity with cloud-based machine learning tools and platforms (e.g. AWS, GCP, Azure) - Strong problem-solving skills, with a data-driven approach to iterating on machine learning models and datasets - Excellent written and verbal communication skills to collaborate effectively in a cross-functional environment Nice to haves: - Experience working at a startup, particularly as founding engineer - Experience building and deploying large-scale AI solutions - Strong written and verbal communication skills to operate in a cross-functional team environment - Proficiency in Arabic (if focused on language models) PLEASE NOTE: Our policy requires a 90-day waiting period before reconsidering candidates for the same role. This allows us to ensure a fair and thorough evaluation of all applicants. About Us: At Scale, our mission is
Staff Software Engineer, Full-Stack - Enterprise Gen AI
Staff Software Engineer, Full-Stack - Enterprise Gen AI Scale GP (Scale Generative AI Platform) is an enterprise-grade AI platform providing APIs for knowledge retrieval, inference, evaluation, and more. We are looking for a frontend-focused full-stack engineer to help build AI-powered applications that redefine enterprise workflows and push the boundaries of interactive AI. This role is ideal for someone who thrives in a fast-paced environment, enjoys working on a diverse set of projects, and has a passion for crafting high-quality, intuitive user experiences. At Scale, you'll work on a mix of cutting-edge customer-facing AI applications and internal SaaS products. Our engineering team powers projects like TIME’s Person of the Year AI experience ( see it in action ), where our AI technology helped shape one of the most iconic features in media. You'll also contribute to Scale’s GenAI Platform ( SGP ), a powerful system that enables businesses to build and deploy AI agents at scale. Whether it’s developing interactive AI assistants, enterprise-grade web applications, or refining our core SaaS platform, you’ll play a crucial role in shaping how AI integrates into real-world applications. You Will: - Build and enhance user-facing AI applications for major enterprise customers, including high-profile media and Fortune 500 companies - Develop and refine features for Scale’s GenAI Platform , empowering businesses to build, deploy, and manage AI-driven agents - Design, build, and optimize polished, high-performance UIs using Next.js, React, TypeScript, and Tailwind - Work closely with product managers, designers, and AI/ML teams to create seamless, intuitive, and impactful user experiences - Integrate frontend applications with backend services, working with APIs, authentication systems, and cloud-based infrastructure - Ship features at a rapid pace while maintaining a high level of code quality, performance, and accessibility Ideally, You Have: - 5+ years of experience developing frontend or fullstack applications in a modern tech stack - Strong proficiency in Next.js, React, TypeScript, and Tailwind , with an eye for building polished, user-friendly interfaces - Experience working on high-visibility, customer-facing applications and making trade-offs between speed and quality in fast-paced environments - A passion for AI and experience working on interactive AI applications, agent-based systems, or data-rich web platforms - Familiarity with backend technologies such as FastAPI, PostgreSQL, GraphQL , and cloud infrastructure like AWS, Azure, or GCP - A track record of collaborating cross-functionally with design, product, and ML teams to bring AI-powered applications to life This role is a unique opportunity to shape the future of AI-powered user experiences , working on projects that impact millions of users while developing tools that empower businesses to deploy AI at scale. If you’re excited by the intersection of AI, frontend engineering, and product design, we’d love to hear from you. The base salary range for this f
Infrastructure Software Engineer, Enterprise GenAI
Scale GP (Scale Generative AI Platform) is an enterprise-grade AI platform that provides APIs for knowledge retrieval, inference, evaluation, and more. We are looking for a strong engineer to join our team and help us build and scale our core infrastructure in a fast-paced environment. The ideal candidate will have a strong understanding of software engineering principles and practices, as well as experience with large-scale distributed systems. You will implement solutions across multiple cloud providers (GCP, Azure, AWS) for customers in diverse, highly-regulated industries like healthcare, telecom, finance, and retail. What You’ll Do: - Architect multi-cloud systems and abstractions to allow the SGP platform to run on top of existing Cloud providers - Implement custom integrations between Scale AI's platform and customer data environments (cloud platforms, data warehouses, internal APIs) - Collaborate with platform, product teams and our customers directly to develop and implement innovative infrastructure that scales to meet evolving needs. - Deliver experiments at a high velocity and level of quality to engage our customers - Work across the entire product lifecycle from conceptualization through production - Be able, and willing, to multi-task and learn new technologies quickly What We’re Looking For: - 4+ years of full-time engineering experience, post-graduation - Experience scaling products at hyper growth startups - Experience tinkering with or productizing LLMs, vector databases, and the other latest AI technologies - Proficient in Python or Javascript/Typescript, and SQL - Experience with Kubernetes - Experience with major cloud providers (AWS, Azure, GCP) - Excellent communication skills with the ability to explain technical concepts to both technical and non-technical audiences Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including job-related skills, experience, qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You'll also receive benefits including, but not limited to: comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend. Please reference the job posting's subtitle for where this position will be located. For pay transparency purposes, the base salary range for this full-time position in the locations of San Francisco, New York, Seattle is: $216,000 — $270,000 USD PLEASE NOTE:&
Deep Research Agent Tech Lead
Scale AI is seeking a highly technical and strategic Staff / Senior Staff Machine Learning Engineer to act as the Tech Lead (TL) for our next generation of deep research agents for the Enterprise. This high-impact role will drive the technical direction and oversight for Deep Research Agent Development , translating cutting-edge research in Generative AI, Large Language Models (LLMs), and Agentic Frameworks into robust, scalable, and high-impact production systems that enhance enterprise operations, analytics, and core efficiency. The ideal candidate thrives in a fast-paced environment, has a passion for both deep technical work and mentoring, and is capable of setting a long-term technical strategy for a critical domain while maintaining a strong, hands-on delivery focus. Responsibilities Technical Leadership & Vision - Set the Technical Roadmap: Define and own the technical strategy, architecture, and roadmap for Deep Research Agents for the Enterprise, ensuring alignment with Scale AI’s overall AI strategy and business goals. - Drive Breakthrough Research to Production: Lead the end-to-end development, from initial research to production deployment, to landing on customer impact, with a focus on integrating diverse data modalities . - Core Agent Capabilities Development: - Advanced Knowledge Retrieval: Architect and implement state-of-the-art retrieval systems to ensure the agents provide accurate and comprehensive answers from public and proprietary data sources from enterprises. - Data analysis: Design and champion the development of data analysis agents that accurately translate complex natural language queries into executable SQL/code against diverse enterprise data schemas. - Multimodal Intelligence: Lead the integration of Multimodal AI capabilities to process and extract structured information from visual documents, tables, and forms, enriching the agent's knowledge base. - Architecture & Design: Design and champion highly scalable, reliable, and low-latency infrastructure and frameworks for building, orchestrating, and evaluating multi-agent systems at enterprise scale. - Technical Excellence: Serve as the technical authority for the team, leading design reviews, defining ML engineering best practices, and ensuring code quality, security, and operational excellence for all agent systems. Team Leadership & Mentorship - Lead and Mentor: Technically lead and mentor a team of Machine Learning Engineers and Research Scientists, fostering a culture of innovation, rigorous engineering, rapid iteration, and technical depth. - Recruiting & Growth: Partner with management to hire, onboard, and grow top-tier talent, helping to shape the long-term structure and capabilities of the team. - Cross-Functional Influence: Collaborate effectively with Product Managers, Data Scientists, and other engineering/science teams to translate ambiguous, high-level business problems into concrete, executable technical specifications and impactful agent sol
Research Engineer, Machine Learning (Reinforcement Learning)
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the teams Our Reinforcement Learning teams lead Anthropic's reinforcement learning research and development, playing a critical role in advancing our AI systems. We've contributed to all Claude models, with significant impacts on the autonomy and coding capabilities of Claude Sonnet 4.5 and Opus 4.5. Our work spans several key areas: - Developing systems that enable models to use computers effectively - Advancing code generation through reinforcement learning - Pioneering fundamental RL research for large language models - Building scalable RL infrastructure and training methodologies - Enhancing model reasoning capabilities We collaborate closely with Anthropic's alignment and frontier red teams to ensure our systems are both capable and safe. We partner with the applied production training team to bring research innovations into deployed models, and are dedicated to implement our research at scale. Our Reinforcement Learning teams sit at the intersection of cutting-edge research and engineering excellence, with a deep commitment to building high-quality, scalable systems that push the boundaries of what AI can accomplish. About the Role As a Research Engineer within Reinforcement Learning, you will collaborate with a diverse group of researchers and engineers to advance the capabilities and safety of large language models. This role blends research and engineering responsibilities, requiring you to both implement novel approaches and contribute to the research direction. You'll work on fundamental research in reinforcement learning, creating 'agentic' models via tool use for open-ended tasks such as computer use and autonomous software generation, improving reasoning abilities in areas such as mathematics, and developing prototypes for internal use, productivity, and evaluation. Representative projects: - Architect and optimize core reinforcement learning infrastructure, from clean training abstractions to distributed experiment management across GPU clusters. Help scale our systems to handle increasingly complex research workflows. - Design, implement, and test novel training environments, evaluations, and methodologies for reinforcement learning agents which push the state of the art for the next generation of models. - Drive performance improvements across our stack through profiling, optimization, and benchmarking. Implement efficient caching solutions and debug distributed systems to accelerate both training and evaluation workflows. - Collaborate across research and engineering teams to develop automated testing frameworks, design clean APIs, and build scalable infrastructure that accelerates AI research. You may be a good fit if you: - Are proficient in Python and async/concurrent programming with frameworks like Trio - Have experience with machine learning frameworks (PyTorch, TensorFlow, JAX) - Have industry experience in machine learning research - Can balance research exploration with engineering implementation<
Hardware / Software CoDesign Engineer - 3P
About the Team OpenAI’s Hardware organization develops silicon and system-level solutions designed for the unique demands of advanced AI workloads. The team is responsible for building the next generation of AI-native silicon while working closely with software and research partners to co-design hardware tightly integrated with AI models. In addition to delivering production-grade silicon for OpenAI’s supercomputing infrastructure, the team also creates custom design tools and methodologies that accelerate innovation and enable hardware optimized specifically for AI. About the Role As an Engineer on our hardware optimization and co-design team, you will co-design future hardware from different vendors for programmability and performance. You will work with our kernel, compiler and machine learning engineers to understand their unique needs related to ML techniques, algorithms, numerical approximations, programming expressivity, and compiler optimizations. You will evangelize these constraints with various vendors to develop and influence future hardware architectures towards efficient training and inference on our models. If you are excited about efficiently distributing a large language model across devices, dealing with and optimizing system-wide/rack-wide networking bottlenecks and eventually tailoring the compute pipe and memory hierarchy of the hardware platform, simulating workloads at different abstractions and working closely with our partners, this is the perfect opportunity! This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees. Key Responsibilities - Co-design future hardware for programmability and performance with our hardware vendors - Assist hardware vendors in developing optimal kernels and add support for it in our compiler - Develop performance estimates for critical kernels for different hardware configurations and drive decisions on compute core and memory hierarchy features - Build system performance models at different abstraction levels and carry out analysis to drive decisions on scale up, scale out, front end networking - Work with machine learning engineers, kernel engineers and compiler developers to understand their vision and needs from high performance accelerators - Manage communication and coordination with internal and external partners - Influence the roadmap of hardware partners to optimize them for OpenAI’s workloads. - Evaluate potential partners’ accelerators and platforms. - As the scope of the role and team grows, understand and influence roadmaps for hardware partners for our datacenter networks, racks, and buildings. Qualifications - 4+ years of industry experience, including experience harnessing compute at scale and optimizing ML platform code to run efficiently on target hardware. - Strong experience in software/hardware co-design - Deep understanding of GPU and/or other AI accelerators - Experience with CUDA, Triton or a related accelerator programming language - Experience driving Machine Learning accuracy with low precision formats - Experience with system performance modeling and analysis to optimize ML model deployment - Strong coding skills in C/C++ and Python - Are familiar with the fundamentals of deep learning computing and chip architecture/microarchitecture. - Able to actively collaborate with ML engineers, kernel writers, compiler developers, system engineers, chip architects/microarchitects Preferred Skills - PhD in Computer Science and Engineering with a specialization in Computer Architecture, Parallel Computing. Compilers or other Systems - Strong understanding of LLMs and challenges related to their training and inference About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world
Machine Learning Fellow - Human Frontier Collective (UK)
PLEASE NOTE: This is a fully remote, 1099 independent contractor opportunity with an estimated duration of six months and the potential for extension. To be eligible, candidates must be authorized to work in the country they reside in. About the Program The Human Frontier Collective (HFC) Fellowship brings together top researchers and domain experts to collaborate on high-impact work that are shaping the future of AI. As an HFC Fellow, you’ll apply your academic and professional expertise to help design, evaluate, and interpret advanced generative AI systems—while gaining exposure to cutting-edge research and working alongside an interdisciplinary network of leading thinkers. What You'll Do - ML Projects: Get invited to engage in high-impact projects with our partnered AI labs and platforms. Help models understand real-world deep learning workflows by designing, reviewing, and optimizing PyTorch models, evaluating complex ML code and AI-generated implementations for efficiency and correctness, and advising on GPU optimization, scaling, and trade-offs. - HFC Community: Beyond the work, you’ll become part of a supportive, interdisciplinary network of innovators and thought leaders committed to advancing frontier AI across domains. - Contribute to Research Publications: Collaborate with Scale’s research team to co-author technical reports and research papers—boosting your academic visibility and professional recognition (e.g., SciPredict , PropensityBench , Professional Reasoning Benchmark ). Who Should Apply - Education: PhD or postdoctoral degree in Computer Science, Computer Engineering, or a related field. - Professional Background: 1-3+ years of experience as a Machine Learning Engineer or Data Scientist. - Skills: Strong proficiency in Python and modern ML frameworks (PyTorch, TensorFlow). Experience with cloud infrastructure (AWS) and MLOps tools (Docker, Langchain) is a plus. - Professional Mindset: Detail-oriented, innovative thinker with a passion in applied AI research and a commitment to collaboration. Why Join the HFC? - Professional Development: High-impact experts expand their influence through review projects, advisory roles, and research, while deepening their AI expertise, strengthening analytical and problem-solving skills, and engaging with pioneering AI applications in science and technology. - Join a Top-Tier Network: Collaborate with a global network of engineers and experts to advance responsible AI through impactful, flexible research and training. 80% of our members come from leading institutions. - Flexible Schedule: Set your own schedule, with flexible 10–40 hour weeks that fit around your life and other commitments. - Competitive Pay: Project pay rates vary across platforms and are depending on a number of factors, including but not limited to; projects, scope, skillset, and loca
Tech Lead Manager- MLRE, ML Systems
Scale's LLM post-training platform team builds our internal distributed framework for large language model training. The platform powers MLEs, researchers, data scientists, and operators for fast and automatic training and evaluation of LLMs. It also serves as the underlying training framework for the data quality evaluation pipeline. Scale is uniquely positioned at the heart of the field of AI as an indispensable provider of training and evaluation data and end-to-end solutions for the ML lifecycle. You will work closely with Scale’s ML teams and researchers to build the foundation platform which supports all our ML research and development works. You will be building and optimizing the platform to enable our next generation LLM training, inference and data curation. If you are excited about shaping the future AI via fundamental innovations, we would love to hear from you! You will: - Build, profile and optimize our training and inference framework. - Collaborate with ML and research teams to accelerate their research and development, and enable them to develop the next generation of models and data curation. - Research and integrate state-of-the-art technologies to optimize our ML system. Ideally you’d have: - Passionate about system optimization - Experience with multi-node LLM training and inference - Experience with developing large-scale distributed ML systems - Experience with post-training methods like RLHF/RLVR and related algorithms like PPO/GRPO etc. - Strong software engineering skills, proficient in frameworks and tools such as CUDA, Pytorch, transformers, flash attention, etc. - Strong written and verbal communication skills to operate in a cross functional team environment. Nice to haves: - Demonstrated expertise in post-training methods and/or next generation use cases for large language models including instruction tuning, RLHF, tool use, reasoning, agents, and multimodal, etc. Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including job-related skills, experience, qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You'll also receive benefits including, but not limited to: comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend. Please reference the job posting's subtitle for where this position will be located. For pay transparency purposes, the base salary range for this full-time position in the locations of San Francisco, New York, Seattle is: $264,800 — $331,000 USD &
Tech Lead/Manager, Machine Learning Research Scientist- LLM Evals
As the leading data and evaluation partner for frontier AI companies, Scale is dedicated to advancing the evaluation and benchmarking of large language models (LLMs). We are building industry-leading LLM evals, setting new standards for model performance assessment. Our mission is to develop rigorous, scalable, and fair evaluation methodologies to drive the next generation of AI capabilities. Our Research teams work with the industry’s leading AI labs to provide high quality data and accelerate progress in GenAI research. As the Tech Lead Manager of the LLM Evals Research team, you will lead a talented team of research scientists and research engineers focused on developing and implementing novel evaluation methodologies, metrics, and benchmarks to assess the capabilities and limitations of our cutting-edge LLMs. This role is critical for designing and executing a roadmap that defines best practices in data driven AI development and will accelerate the next generation of generative AI models in partnership with top foundational model labs. You will: - Lead a team of highly effective research scientists and research engineers on LLM evals. - Conduct research on the effectiveness and limitations of existing LLM evaluation techniques. - Design and develop novel evaluation benchmarks for large language models, covering areas such as instruction following, factuality, robustness, and fairness. - Communicate, collaborate, and build relationships with clients and peer teams to facilitate cross-functional projects. - Collaborate with internal teams and external partners to refine metrics and create standardized evaluation protocols. - Implement scalable and reproducible evaluation pipelines using modern ML frameworks. - Publish research findings in top-tier AI conferences and contribute to open-source benchmarking initiatives. - Remain up-to-date on ongoing research in the team, help work through technical challenges, and be involved in design decisions - Remain deeply involved in the research community, both understanding trends, and setting them - Thrive in a high-energy, fast-paced startup environment and are ready to dedicate the time and effort needed to drive impactful results. Ideally you'd have: - 5+ years of hands-on experience in large language model, NLP, and Transformer modeling, in the setting of both research and engineering development - Experience and track of recording in landing major research impacts in a fast-paced environment - Experience supporting and leading a team of research scientists and research engineers - Excellent written and verbal communication skills - Published research in areas of machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals - Previous experience in a customer facing role. Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including job-related skills, experience, qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are also grante
Research Engineer/Research Scientist, Pre-training
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. Anthropic is at the forefront of AI research, dedicated to developing safe, ethical, and powerful artificial intelligence. Our mission is to ensure that transformative AI systems are aligned with human interests. We are seeking a Research Engineer to join our Pre-training team, responsible for developing the next generation of large language models. In this role, you will work at the intersection of cutting-edge research and practical engineering, contributing to the development of safe, steerable, and trustworthy AI systems. Key Responsibilities: - Conduct research and implement solutions in areas such as model architecture, algorithms, data processing, and optimizer development - Independently lead small research projects while collaborating with team members on larger initiatives - Design, run, and analyze scientific experiments to advance our understanding of large language models - Optimize and scale our training infrastructure to improve efficiency and reliability - Develop and improve dev tooling to enhance team productivity - Contribute to the entire stack, from low-level optimizations to high-level model design Qualifications: - Advanced degree (MS or PhD) in Computer Science, Machine Learning, or a related field - Strong software engineering skills with a proven track record of building complex systems - Expertise in Python and experience with deep learning frameworks (PyTorch preferred) - Familiarity with large-scale machine learning, particularly in the context of language models - Ability to balance research goals with practical engineering constraints - Strong problem-solving skills and a results-oriented mindset - Excellent communication skills and ability to work in a collaborative environment - Care about the societal impacts of your work Preferred Experience: - Work on high-performance, large-scale ML systems - Familiarity with GPUs, Kubernetes, and OS internals - Experience with language modeling using transformer architectures - Knowledge of reinforcement learning techniques - Background in large-scale ETL processes You'll thrive in this role if you: - Have significant software engineering experience - Are results-oriented with a bias towards flexibility and impact - Willingly take on tasks outside your job description to support the team - Enjoy pair programming and collaborative work - Are eager to learn more about machine learning research - Are enthusiastic to work at an organization that functions as a single, cohesive team pursuing large-scale AI research projects - Are working to align state of the art models with human values and preferences, understand and interpret deep neural networks, or develop new models to support these areas of research - View research and engineering as
Senior Research Scientist, Reward Models
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role As a Senior Research Scientist on our Reward Models team, you'll lead research efforts to improve how we specify and learn human preferences at scale. Your work will directly shape how our models understand and optimize for what humans actually want — enabling Claude to be more useful, more reliable, and better aligned with human values. This role focuses on pushing the frontier of reward modeling for large language models. You'll develop novel architectures and training methodologies for RLHF, research new approaches to LLM-based evaluation and grading (including rubric-based methods), and investigate techniques to identify and mitigate reward hacking. You'll collaborate closely with teams across Anthropic, including Finetuning, Alignment Science, and our broader research organization, to ensure your work translates into concrete improvements in both model capabilities and safety. We're looking for someone who can drive ambitious research agendas while also shipping practical improvements to production systems. You'll have the opportunity to work on some of the most important open problems in AI alignment, with access to frontier models and significant computational resources. Your work will directly advance the science of how we train AI systems to be both highly capable and safe. Note: For this role, we conduct all interviews in Python. Responsibilities - Lead research on novel reward model architectures and training approaches for RLHF - Develop and evaluate LLM-based grading and evaluation methods, including rubric-driven approaches that improve consistency and interpretability - Research techniques to detect, characterize, and mitigate reward hacking and specification gaming - Design experiments to understand reward model generalization, robustness, and failure modes - Collaborate with the Finetuning team to translate research insights into improvements for production training pipelines - Contribute to research publications, blog posts, and internal documentation - Mentor other researchers and help build institutional knowledge around reward modeling You may be a good fit if you - Have a track record of research contributions in reward modeling, RLHF, or closely related areas of machine learning - Have experience training and evaluating reward models for large language models - Are comfortable designing and running large-scale experiments with significant computational resources - Can work effectively across research and engineering, iterating quickly while maintaining scientific rigor - Enjoy collaborative research and can communicate complex ideas clearly to diverse audiences - Care deeply about building AI systems that are both highly capable and safe Strong candidates may also - Have published research on reward modeling, preference learning, or RLHF - Have experience with LLM-as-judge approaches, including calibration and reliabili
Research Engineer, Pretraining Scaling
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Role: Anthropic's ML Performance and Scaling team trains our production pretrained models, work that directly shapes the company's future and our mission to build safe, beneficial AI systems. As a Research Engineer on this team, you'll ensure our frontier models train reliably, efficiently, and at scale. This is demanding, high-impact work that requires both deep technical expertise and a genuine passion for the craft of large-scale ML systems. This role lives at the boundary between research and engineering. You'll work across our entire production training stack: performance optimization, hardware debugging, experimental design, and launch coordination. During launches, the team works in tight lockstep, responding to production issues that can't wait for tomorrow. Responsibilities: - Own critical aspects of our production pretraining pipeline, including model operations, performance optimization, observability, and reliability - Debug and resolve complex issues across the full stack—from hardware errors and networking to training dynamics and evaluation infrastructure - Design and run experiments to improve training efficiency, reduce step time, increase uptime, and enhance model performance - Respond to on-call incidents during model launches, diagnosing problems quickly and coordinating solutions across teams - Build and maintain production logging, monitoring dashboards, and evaluation infrastructure - Add new capabilities to the training codebase, such as long context support or novel architectures - Collaborate closely with teammates across SF and London, as well as with Tokens, Architectures, and Systems teams - Contribute to the team's institutional knowledge by documenting systems, debugging approaches, and lessons learned You May Be a Good Fit If You: - Have hands-on experience training large language models, or deep expertise with JAX, TPU, PyTorch, or large-scale distributed systems - Genuinely enjoy both research and engineering work—you'd describe your ideal split as roughly 50/50 rather than heavily weighted toward one or the other - Are excited about being on-call for production systems, working long days during launches, and solving hard problems under pressure - Thrive when working on whatever is most impactful, even if that changes day-to-day based on what the production model needs - Excel at debugging complex, ambiguous problems across multiple layers of the stack - Communicate clearly and collaborate effectively, especially when coordinating across time zones or during high-stress incidents - Are passionate about the work itself and want to refine your craft as a research engineer - Care about the societal impacts of AI and responsible scaling Strong Candidates May Also Have: - Previous experience training LLM’s or working extensively with JAX/TPU, PyTorch, or other ML frameworks at scale - Contributed to open-source LLM frame
Forward Deployed Engineer, GenAI
About Scale AI At Scale AI, our mission is to accelerate the development of AI applications. For 8 years, Scale has been the leading AI data foundry, helping fuel the most exciting advancements in AI, including generative AI, defense applications, and autonomous vehicles. With our recent Series F round, we’re accelerating the abundance of frontier data to pave the road to Artificial General Intelligence (AGI) and building upon our prior model evaluation work with enterprise customers and governments to deepen our capabilities and offerings for public and private evaluations. About Data Engine Our Generative AI Data Engine powers the world’s most advanced LLMs and generative models through world-class RLHF (Reinforcement Learning with Human Feedback), human data generation, model evaluation, safety, and alignment. The data we produce is some of the most critical work for how humanity will interact with AI. About Our FDE Team Generating high-quality data is the core problem our business solves. We aim to make producing and delivering high-quality data seamless and efficient for operators and customers. Our Team is building customer and operator-specific infrastructure to provide high-quality data with low turnaround time. You'll be exposed to the cutting edge of the Generative AI industry while directly interfacing with the leading model-building organizations in the space, including the top AI research labs and government agencies. Join us in shaping the future of Artificial General Intelligence. As a Forward Deployed Engineer, you'll be at the forefront of providing the critical data infrastructure that powers the most advanced AI models, directly influencing how humanity interacts with AI. You will work with the world’s leading AI companies and government agencies to solve their most complex AI data-related problems. Responsibilities: - Drive Impact: Directly contribute to the advancement of AI by delivering critical data solutions for leading AI innovators and government agencies. - Customer Collaboration: Interact daily with our technical customers, understanding their unique challenges and translating them into impactful solutions. - End-to-End Development: Design, build, and deploy features across the entire stack, from front-end interfaces to back-end systems and infrastructure. - Rapid Experimentation: Deliver high-quality experiments quickly, iterating quickly to meet customer needs and drive innovation. - Strategic Influence: Play a key role in shaping our engineering culture, values, and processes, contributing to the growth of our team and the evolution of our product. - Diverse Projects: Engage in a dynamic mix of designing and deploying cutting-edge data solutions, collaborating with leading AI researchers, and directly influencing the product roadmap. You'll work on everything from large-scale system architecture to customer-facing front-end application design. - Leadership Growth: This role offers a unique opportunity to lead critical projects, shape our engineering culture, and accelerate your career growth in the rapidly evolving field of Generative AI. You'll be positioned to become a future leader in a company defining the next era of technology. Requirements: - At least 2 years of relevant experience is preferred - Proven track record of shipping high-quality products and features at scale. - Strong problem-solving skills and the ability to work independently or as part of a collabo
Principal AI Ops Architect, GPS
Role Overview Scale’s rapidly growing Global Public Sector team is focused on using AI to address critical challenges facing the public sector around the world. Our core work consists of: - Creating custom AI applications that will impact millions of citizens - Generating high-quality training data for national LLMs - Upskilling and advisory services to spread the impact of AI As a Principal AI Ops Architect, you will design and develop the production lifecycle of full-stack AI applications, while supporting end-to-end system reliability, real-time inference observability, sovereign data orchestration, high-security software integration, and the resilient cloud infrastructure required for our international government partners. At Scale, we’re not just building AI solutions—we’re enabling the public sector to transform their operations and better serve citizens through cutting-edge technology. If you’re ready to shape the future of AI in the public sector and be a founding member of our team, we’d love to hear from you. You will: - Own the production outcome: Take full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies. - Ensure Full-Stack integrity: Oversee the end-to-end health of the platform, ensuring seamless integration between the AI core and all full-stack components, from APIs to UI, to maintain a responsive and production-ready environment. - Scale the feedback loop: Build automated systems to monitor model performance and data drift across geographically dispersed environments, ensuring the right levels of reliability. - Navigate global compliance: Manage the technical lifecycle within diverse regulatory frameworks. - Incident command: Lead the response for production issues in mission-critical environments, ensuring rapid resolution and building the guardrails to prevent them from happening again. - Bridge the gap: Translate deep technical performance metrics into clear insights for senior international government officials. - Drive product evolution: Partner with our Engineering and ML teams to ensure the lessons learned in the field directly influence the technical architecture and decisions of future use cases. Ideally, you have: - Experience: 6+ years in a high-impact technical role (SRE, FDE or MLOps) with experience in the public sector. - Global perspective: Familiarity with international government security standards and the complexities of deploying sovereign AI. - System architecture proficiency: Proven experience maintaining production-grade applications with a deep understanding of the full request lifecycle-connecting frontend/API layers to the backend and AI core. - Modern AI Stack expertise: Proficiency in coding and the modern AI infrastructure, including Kubernetes, vector databases, agentic development, and LLM observability tools. - Ownership: You treat every production deployment as your own. You race toward solving hard problems before the customer even sees them. - Reliability: You understand that in the public sector, a model failu
Research Engineer, RL Infrastructure (Knowledge Work)
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role The Knowledge Work team builds the training environments and evaluations that make Claude effective at real-world professional workflows — searching, analyzing, and creating across the tools and documents knowledge workers use every day. As that work scales, the systems behind it need to be as rigorous as the research itself. We are looking for a Research Engineer to own the reliability, observability, and infrastructure foundation that the team's research depends on. You will be responsible for ensuring our training and evaluation runs remain stable, well-instrumented, and high-quality as they grow in scale and complexity. A core part of this role is shifting reliability work from reactive to proactive: hardening systems, stress-testing at realistic scale, and building the observability and tooling that surface problems early — so researchers can stay focused on research rather than incident response. You will be the team's stable, context-rich owner for environment health and evaluation integrity, and the primary point of contact for partner teams when issues arise. Where this role focuses: While you'll work closely with researchers building new training environments, the priority for this role is the reliability those environments depend on. It's best suited to an engineer who finds real ownership and impact in making critical systems dependable, and in being the person behind trustworthy evaluation results the entire organization relies on. Key Responsibilities: - Serve as the dedicated reliability owner for the Knowledge Work training environments, providing continuity of context and reducing the operational overhead of rotating ownership - Own a clean, canonical set of evaluation tools and processes for Knowledge Work capabilities, including the process used for model releases - Build and automate observability, dashboards, and operational tooling for our training environments and evaluation systems, with an emphasis on high signal-to-noise: a small set of trusted metrics and alerts rather than sprawling instrumentation - Proactively harden environments and evaluation systems through load testing, fault injection, and stress testing at realistic scale, so failures surface early rather than during critical training work - Act as the primary point of contact for partner training and infrastructure teams when issues in our environments arise, and drive incidents to resolution - Reduce the operational burden on researchers so they can stay focused on research Minimum Qualifications: - Highly experienced Python engineer who ships reliable, well-instrumented code that teammates trust in production - Demonstrated experience operating ML or distributed systems at scale, including significant on-call and incident-response experience - Strong SRE or production-engineering mindset — reaching for SLOs, load tests, and failure injection before reaching for more dashboards <li
Research Scientist, AI Controls and Monitoring
Scale Labs, Research Scientist — AI Controls and Monitoring As the leading data and evaluation partner for frontier AI companies, Scale plays an integral role in understanding the capabilities and safeguarding AI models and systems. Building on this expertise, Scale Labs has launched a new team focused on policy research, to bridge the gap between AI research and global policymakers to make informed, scientific decisions about AI risks and capabilities. Our research tackles the hardest problems in agent robustness, AI control protocols, and AI risk evaluations to help governments, industry, and the public understand and mitigate AI risk while maximizing AI adoption. This team collaborates broadly across industry, the public sector, and academia and regularly publishes our findings. We are actively seeking talented researchers to join us in shaping this vision. As a Research Scientist focused on AI Controls and Monitoring, you will design methods, systems, and experiments to ensure that advanced AI models and agents remain aligned with intended goals, even in high-stakes or adversarial environments. For example, you might: - Develop monitoring techniques and observability methods that track AI behavior in real time to identify and flag deviations, emergent capabilities, or anomalous outputs; - Research mechanisms for layered control, including fail-safes, oversight protocols, and intervention methods that can halt or redirect AI systems when risks are detected; - Design red-team simulations to probe weaknesses in oversight and control mechanisms, and build mitigations to close identified gaps; - Collaborate with policymakers, engineers, and other researchers to establish standards and benchmarks for AI monitoring and escalation. Ideally you’d have: - Commitment to our mission of promoting safe, secure, and trustworthy AI deployments in the industry as frontier AI capabilities continue to advance. - Practical experience conducting technical research collaboratively. You should be comfortable designing control and monitoring experiments for AI systems, building prototype systems, and quickly turning new ideas from the research literature into working prototypes. - A track record of published research in machine learning, particularly in generative AI. - At least three years of experience addressing sophisticated ML problems, whether in a research setting or in product development. - Strong written and verbal communication skills to operate in a cross-functional team. Nice to have: - Experience with runtime monitoring, anomaly detection, or observability for ML systems. - Familiarity with AI control or alignment research (e.g., scalable oversight, interpretability, debate). - Experience with post-training and RL techniques such as RLHF, DPO, GRPO, and similar approaches. Our research interviews are crafted to assess candidates' skills in practical ML prototyping and debugging, their grasp of research concepts, and their alignment with our organizational culture. We will not ask any LeetCode-style questions. If you’re excited about advancing AI safety and contributing to our mission, we encourage you to apply, even if your experience doesn’t perfectly align with every requirement. Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new h
Researcher, Frontier Cybersecurity Risks
ABOUT THE TEAM Preparedness is a critical Safety Research team at OpenAI, which is focused on mitigating AI threats to global security https://openai.com/index/updating-our-preparedness-framework/ that could scale to an extreme level of severity. Our work involves: 1. Measurement. Monitoring and predicting the evolving capabilities of frontier AI systems. 2. Mitigation. Keeping misuse safeguards, alignment tools, and security measures on track to adequately address extreme threats that might arise in the future. 3. Coordination. Setting mitigation targets by maintaining OpenAI’s preparedness framework https://openai.com/index/updating-our-preparedness-framework/, and partnering with other staff to achieve these targets. This is urgent, fast-paced work that has far-reaching implications for the company and for society. ABOUT THE ROLE Models are becoming increasingly capable—moving from tools that assist humans to agents that can plan, execute, and adapt in the real world. As we push toward AGI, cybersecurity becomes one of the most important and urgent frontiers: the same systems that can accelerate productivity can also accelerate exploitation. As a Researcher for cybersecurity risks, you will help design and implement an end-to-end mitigation stack to reduce severe cyber misuse across OpenAI’s products. This role requires strong technical depth and close cross-functional collaboration to ensure safeguards are enforceable, scalable, and effective. You’ll contribute directly to building protections that remain robust as products, model capabilities, and attacker behaviors evolve. IN THIS ROLE, YOU WILL: - Design and implement mitigation components for model-enabled cybersecurity misuse—spanning prevention, monitoring, detection, and enforcement—under the guidance of senior technical and risk leadership. - Integrate safeguards across product surfaces in partnership with product and engineering teams, helping ensure protections are consistent, low-latency, and scale with usage and new model capabilities. - Evaluate technical trade-offs within the cybersecurity risk domain (coverage, latency, model utility, and user privacy) and propose pragmatic, testable solutions. - Collaborate closely with risk and threat modeling partners to align mitigation design with anticipated attacker behaviors and high-impact misuse scenarios. - Execute rigorous testing and red-teaming workflows, helping stress-test the mitigation stack against evolving threats (e.g., novel exploits, tool-use chains, automated attack workflows) and across different product surfaces—then iterate based on findings. YOU MIGHT THRIVE IN THIS ROLE IF YOU: - Have a passion for AI safety and are motivated to make cutting-edge AI models safer for real-world use. - Bring demonstrated experience in deep learning and transformer models. - Are proficient with frameworks such as PyTorch or TensorFlow. - Possess a strong foundation in data structures, algorithms, and software engineering principles. - Are familiar with methods for training and fine-tuning large language models, including distillation, supervised fine-tuning, and policy optimization. - Excel at working collaboratively with cross-functional teams across research, security, policy, product, and engineering. - Have significant experience designing and deploying technical safeguards for abuse prevention, detection, and enforcement at scale. - (Nice to have) Bring background knowledge in cybersecurity or adjacent fields. About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectr
[Expression of Interest] Research Manager, Interpretability
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. Note: we don't have open Research Manager positions on the Interpretability team at this time. However, we're actively growing our team of Research Engineers and Research Scientists . If you're excited about interpretability research and open to an individual contributor role, we encourage you to apply. About the Interpretability team: When you see what modern language models are capable of, do you wonder, "How do these things work? How can we trust them?" The Interpretability team’s mission is to reverse engineer how trained models work, and Interpretability research is one of Anthropic’s core research bets on AI safety. We believe that a mechanistic understanding is the most robust way to make advanced systems safe. People mean many different things by "interpretability". We're focused on mechanistic interpretability, which aims to discover how neural network parameters map to meaningful algorithms. Some useful analogies might be to think of us as trying to do "biology" or "neuroscience" of neural networks, or as treating neural networks as binary computer programs we're trying to "reverse engineer". We aim to create a solid scientific foundation for mechanistically understanding neural networks and making them safe (see our vision post ). We have focused on resolving the issue of "superposition" (see Toy Models of Superposition , Superposition, Memorization, and Double Descent , and our May 2023 update ), which causes the computational units of the models, like neurons and attention heads, to be individually uninterpretable, and on finding ways to decompose models into more interpretable components. Our subsequent work which found millions of features in Claude 3.0 Sonnet, one of our production language models, represents progress in this direction. In our most recent work , we developed methods that allow us to build circuits using features and use these circuits to understand the mechanisms associated with a model's computation and study specific examples of multi-hop reasoning, planning, and chain-of-thought faithfulness on Claude Haiku 3.5, one of our production models.” This is a stepping stone towards our overall goal of mechanistically understanding neural networks. A few places to learn more about our work and team are this introduction to Interpretability from our research lead, Chris Olah, Stanford CS25 lecture given by Josh Batson, and TWIML AI podcast with
Research Engineer/Research Scientist, RL/Reasoning
About the Team The RL and Reasoning team drives the core reasoning paradigm and has created groundbreaking innovations such as o1 and o3. They focus on pushing the boundaries of reinforcement learning research, building next-generation generative models, and deploying them at scale. About the Role As a Research Engineer/Research Scientist at OpenAI, you will advance the frontier of AI alignment and capabilities through cutting-edge RL methods. Your work will sit at the heart of training intelligent, aligned, and general-purpose agents, including the systems that power various models. We’re looking for people who have a background in reinforcement learning research, are able to iterate quickly, and are proficient at coding. This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees. You might thrive in this role if: - You love being on the cutting edge of RL and language model research. - You’re a self-starter who takes initiative and ownership of ideas, driving them to completion. - You value principled approaches, simple experiments in tightly-controlled settings, and reaching trustworthy conclusions which stand the test of time. - You thrive in a fast-paced, dynamic, and technically complex environment where rapid iteration is key. - You’re comfortable diving into a large ML codebase to debug and improve it. - You have a deep understanding of machine learning and machine learning applications. About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement https://cdn.openai.com/policies/eeo-policy-statement.pdf. Background checks for applicants will be administered in accordance with applicable law, and qualified applicants with arrest or conviction records will be considered for employment consistent with those laws, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act, for US-based candidates. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations. To notify OpenAI that you believe this job posting is non-compliant, please submit a report through this form https://form.asana.com/?d=57018692298241&k=5MqR40fZd7jlxVUh5J-UeA. No response will be provided to inquiries unrelated to job posting compliance. We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link https://form.asana.com/?k=bQ7w9h3iexRlicUdWRiwvg&d=57018692298241. OpenAI Global
Software Engineer, Enterprise AI
Scale GP (Scale Generative AI Platform) is an enterprise-grade Generative AI platform that provides APIs for knowledge retrieval, inference, evaluation, and more. We are looking for a strong engineer to join our team and help us build and scale our product in a fast-paced environment. The ideal candidate will have a strong understanding of software engineering principles and practices, as well as experience with large-scale distributed systems. You will be responsible for owning large new areas within our product, working across backend, frontend, and interacting with LLMs and ML models. You will solve hard engineering problems in scalability and reliability. You will: - Own large new areas within our product - Work across backend, frontend, and interacting with LLMs and ML models - Deliver experiments at a high velocity and level of quality to engage our customers - Work across the entire product lifecycle from conceptualization through production - Be able, and willing, to multi-task and learn new technologies quickly Ideally you'd have: - 4+ years of full-time engineering experience, post-graduation - Experience scaling products at hyper growth startups - Experience tinkering with or productizing LLMs, vector databases, and the other latest AI technologies - Proficient in Python or Javascript/Typescript, and SQL - Experience with Kubernetes - Experience with major cloud providers (AWS, Azure, GCP) Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including job-related skills, experience, qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You'll also receive benefits including, but not limited to: comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend. Please reference the job posting's subtitle for where this position will be located. For pay transparency purposes, the base salary range for this full-time position in the locations of San Francisco, New York, Seattle is: $216,000 — $270,000 USD PLEASE NOTE: Our policy requires a 90-day waiting period before reconsidering candidates for the same role. This allows us to ensure a fair and thorough evaluation of all applicants. About Us: At Scale, our mission is to develop reliable AI systems for the world's most important decisions. Our products provide the high-quality data and full-sta
Research Engineer, Discovery
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Team Our team is organized around the north star goal of building an AI scientist – a system capable of solving the long term reasoning challenges and basic capabilities necessary to push the scientific frontier. About the role As a Research Engineer on our team you will work end to end across the whole model stack, identifying and addressing key infra blockers on the path to scientific AGI. Strong candidates should have familiarity with elements of language model training, evaluation, and inference and eagerness to quickly dive and get up to speed in areas they are not yet an expert on. This may include performance optimization, distributed systems, VM/sandboxing/container deployment, and large scale data pipelines. Join us in our mission to develop advanced AI systems pushing the frontiers of science and benefiting humanity. Responsibilities: - Design and implement large-scale infrastructure systems to support AI scientist training, evaluation, and deployment across distributed environments - Identify and resolve infrastructure bottlenecks impeding progress toward scientific capabilities - Develop robust and reliable evaluation frameworks for measuring progress towards scientific AGI. - Build scalable and performant VM/sandboxing/container architectures to safely execute long-horizon AI tasks and scientific workflows - Collaborate to translate experimental requirements into production-ready infrastructure - Develop large scale data pipelines to handle advanced language model training requirements - Optimize large scale training and inference pipelines for stable and efficient reinforcement learning You may be a good fit if you: - Have 6+ years of highly-relevant experience in infrastructure engineering with demonstrated expertise in large-scale distributed systems - Are a strong communicator and enjoy working collaboratively - Possess deep knowledge of performance optimization techniques and system architectures for high-throughput ML workloads - Have experience with containerization technologies (Docker, Kubernetes) and orchestration at scale - Have proven track record of building large-scale data pipelines and distributed storage systems - Excel at diagnosing and resolving complex infrastructure challenges in production environments - Can work effectively across the full ML stack from data pipelines to performance optimization - Have experience collaborating with other researchers to scale experimental ideas - Thrive in fast-paced environments and can rapidly iterate from experimentation to production Strong candidates may also have: - Experience with language model training infrastructure and distributed ML frameworks (PyTorch, JAX, etc.) - Background in building infrastructure for AI research labs or large-scale ML organizations - Knowledge of GPU/TPU architectures and language mod
Research Engineer, Pretraining
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. Anthropic is at the forefront of AI research, dedicated to developing safe, ethical, and powerful artificial intelligence. Our mission is to ensure that transformative AI systems are aligned with human interests. We are seeking a Research Engineer to join our Pretraining team, responsible for developing the next generation of large language models. In this role, you will work at the intersection of cutting-edge research and practical engineering, contributing to the development of safe, steerable, and trustworthy AI systems. Key Responsibilities: - Conduct research and implement solutions in areas such as model architecture, algorithms, data processing, and optimizer development - Independently lead small research projects while collaborating with team members on larger initiatives - Design, run, and analyze scientific experiments to advance our understanding of large language models - Optimize and scale our training infrastructure to improve efficiency and reliability - Develop and improve dev tooling to enhance team productivity - Contribute to the entire stack, from low-level optimizations to high-level model design Qualifications: - Advanced degree (MS or PhD) in Computer Science, Machine Learning, or a related field - Strong software engineering skills with a proven track record of building complex systems - Expertise in Python and experience with deep learning frameworks (PyTorch preferred) - Familiarity with large-scale machine learning, particularly in the context of language models - Ability to balance research goals with practical engineering constraints - Strong problem-solving skills and a results-oriented mindset - Excellent communication skills and ability to work in a collaborative environment - Care about the societal impacts of your work Preferred Experience: - Work on high-performance, large-scale ML systems - Familiarity with GPUs, Kubernetes, and OS internals - Experience with language modeling using transformer architectures - Knowledge of reinforcement learning techniques - Background in large-scale ETL processes You'll thrive in this role if you: - Have significant software engineering experience - Are results-oriented with a bias towards flexibility and impact - Willingly take on tasks outside your job description to support the team - Enjoy pair programming and collaborative work &l
ML Systems Engineer, Robotics
Scale's Physical AI business unit is dedicated to solving the data bottleneck across Robotics, Autonomous Vehicles, and Computer Vision. This position will be a key contributor in conducting applied research in Physical AI and developing ML pipelines for processing, training, and fine-tuning on data collected by Scale, with a specific focus on optimizing algorithms and pipelines to run efficiently on GPUs in the cloud. In this role, you will have the opportunity to advance research, shape Scale’s offerings, and expand the frontier of data and model evaluation for Physical AI. The Role As an ML Systems Engineer on the Physical AI team, you will design and build platforms for scalable, reliable, and efficient serving of foundation models specifically tailored for physical agents. Our platform powers cutting-edge research and production systems, supporting both internal research discovery and external customer use cases for autonomous vehicles and robotics. The ideal candidate combines strong ML fundamentals with deep expertise in backend system design. You’ll work in a highly collaborative environment, bridging the gap between Physical AI research and production engineering to accelerate innovation across the company. You Will: - Build & Scale: Maintain fault-tolerant, high-performance systems for serving robotics-related models and foundation models at scale, ensuring low latency for real-time applications. - Platform Development: Build an internal platform to empower model capability discovery, enabling faster iteration cycles for research teams working on robotics. - Collaborate: Work closely with Robotics researchers and Computer Vision engineers to integrate and optimize models for production and research environments. - Design Excellence: Conduct architecture and design reviews to uphold best practices in system scalability, reliability, and security. - Observability: Develop monitoring and observability solutions to ensure system health and real-time performance tracking of model inference. - Lead: Own projects end-to-end, from requirements gathering to implementation, in a fast-paced, cross-functional environment. Ideally, You’d Have: - Experience: 4+ years of experience building large-scale, high-performance backend systems, with deep experience in machine learning infrastructure. - Algorithm Optimization: Deep experience optimizing computer vision and other machine learning algorithms for cloud environments, including GPU-level algorithm optimizations (e.g., CUDA, kernel tuning). - Programming: Strong skills in one or more systems-level languages (e.g., Python, Go, Rust, C++). - Systems Fundamentals: Deep understanding of serving and routing fundamentals (e.g., rate limiting, load balancing, compute budgets, concurrency) for data-intensive applications. - Infrastructure: Experience with containers (Docker), orchestration (Kubernetes), and cloud providers (AWS/GCP). - IaC: Familiarity with infrastructure as code (e.g., Terraform). - Mindset: Proven ability to solve complex problems and work independently in fast-moving environments. Nice to Haves: - Exposure to Vision-Language-Action (VLA) models. - Knowledge of high-performance video processing (e.g., FFmpeg, NVDEC/NVENC) or 3D data handling (point clouds). - Familiarity with robotics middleware (e.g., ROS/ROS2) or AV data formats.
Staff Machine Learning Research Scientist, LLM Evals
As the leading data and evaluation partner for frontier AI companies, Scale is dedicated to advancing the evaluation and benchmarking of large language models (LLMs). We are building industry-leading LLM evals, setting new standards for model performance assessment. Our mission is to develop rigorous, scalable, and fair evaluation methodologies to drive the next generation of AI capabilities. Our Research teams work with the industry’s leading AI labs to provide high quality data and accelerate progress in GenAI research. As a Staff Machine Learning Research Scientist on the LLM Evals team, you will lead the development of novel evaluation methodologies, metrics, and benchmarks to measure the capabilities and limitations of frontier LLMs. You will help define what "good" looks like in generative AI, driving research that informs both our internal roadmap and the broader research community. This role is critical for designing and executing a roadmap that defines best practices in data driven AI development and will accelerate the next generation of generative AI models in partnership with top foundational model labs. You will: - Drive research on the effectiveness and limitations of existing LLM evaluation techniques. - Design and develop novel evaluation benchmarks for large language models, covering areas such as instruction following, factuality, robustness, and fairness. - Communicate, collaborate, and build relationships with clients and peer teams to facilitate cross-functional projects. - Collaborate with internal teams and external partners to refine metrics and create standardized evaluation protocols. - Implement scalable and reproducible evaluation pipelines using modern ML frameworks. - Publish research findings in top-tier AI conferences and contribute to open-source benchmarking initiatives. - Mentor and guide research scientists and engineers, providing technical leadership across cross-functional projects. - Stay deeply engaged with the ML research community, tracking emerging work and contributing to the advancement of LLM evaluation science. - Thrive in a high-energy, fast-paced startup environment and are ready to dedicate the time and effort needed to drive impactful results. Ideally you'd have: - 5+ years of hands-on experience in large language model, NLP, and Transformer modeling, in the setting of both research and engineering development - Experience and track of recording in landing major research impacts in a fast-paced environment - Experience tech leading a team of research scientists and research engineers - Excellent written and verbal communication skills - Published research in areas of machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals - Previous experience in a customer facing role. Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including job-related skills, experience, qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are also granted eq
Senior Machine Learning Engineer, Public Sector
The goal of a Senior Machine Learning Engineer at Scale is to leverage techniques in the fields of generative AI, computer vision, reinforcement learning, and agentic AI to improve Scale's products and customer experience in production environments. Our machine learning engineers take advantage of robust internal infrastructure and unique access to massive datasets to deliver improvements to our customers. Our Public Sector Machine Learning team is focused on deploying cutting-edge models to mission-critical government systems through products like Donovan and Thunderforge . Our work spans multiple modalities, with a strong focus on both large language models and computer vision. On the LLM side, we are developing agentic systems that help solve complex operational and planning challenges for government partners. This includes building agent frameworks that integrate with custom retrieval pipelines and production APIs, as well as evaluation tools to benchmark and refine agent behavior. We're also advancing research in areas like reinforcement learning for agentic LLMs, with successful deployment into real-world operational environments. On the computer vision front, we're training advanced models to increase labeling throughput and automate perception tasks. Our efforts include building large-scale fine-tuning pipelines, training models across multiple modalities, and developing generalizable vision foundation models to support a wide range of defense applications. You will: - Take state of the art models developed internally and from the community, use them in production to solve problems for our customers and taskers - Improve and maintain production models through retraining, hyperparameter tuning, and architectural updates, while preserving core performance characteristics - Collaborate with product and research teams to identify and prototype ML-driven product enhancements, including for upcoming product lines - Work with massive datasets to develop both generic models as well as fine tune models for specific products - Build scalable machine learning infrastructure to automate and optimize our ML services - Serve as a cross-functional representative and advocate for machine learning techniques across engineering and product organizations - Be comfortable learning new technologies quickly and managing multiple priorities in a fast-paced environment - Comfortable with light travel (approximately 10%) for customer interaction and team needs - This role will require an active security clearance or the ability to obtain a security clearance Ideally You’d Have: - Extensive experience with GenAI, Agentic AI, natural language processing, deep learning and deep reinforcement learning, or computer vision in a production environment - Solid background in algorithms, data structures, and object-oriented programming - Strong programing skills in Python, experience in Tensorflow or PyTorch Nice to Haves: - Graduate degree in Computer Science, Machine Learning or Artificial Intelligence specialization - Experience working with cloud platforms (eg. AWS or GCP) and deploying machine learning models in cloud environments - Experience with computer vision, generative AI models, large language models, or agentic systems - Familiarity with ML evaluation frameworks and agentic model design
Software Engineer, Safeguards Labs
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role Safeguards Labs is a new team operating at the intersection of research and engineering, chartered to investigate novel safety methods that protect Claude and the people who use it. We prototype new approaches to safe models, usage safeguards, and production safety, pressure-testing ideas before they graduate into production systems run by our partner Safeguards teams. We're hiring software engineers to partner with our research engineers and turn promising prototypes into reliable, production-grade safeguards. The team is small, so each engineer has substantial latitude over what they work on and high leverage on the team's direction. Key responsibilities - Take research prototypes and harden them into production services that integrate with Anthropic's real-time safeguards path. - Build data and evaluation infrastructure that lets the team iterate on prototypes quickly and measure whether safeguards actually work, including in agentic settings. - Own deployment, monitoring, and reliability for systems Labs ships. - Build internal tooling that helps investigators surface and act on abuse patterns. - Collaborate with research engineers on scoping and contribute to decisions about which prototypes are ready to graduate. Minimum qualifications - Strong proficiency in Python and comfort working with large datasets. - A track record of designing, building, and operating production backend systems or data pipelines. - Experience taking software from prototype to production, including testing, monitoring, and reliability work. - Working familiarity with how large language models operate, even if LLMs aren't your primary background. - Care about the societal impacts of AI and want your work to directly reduce real-world harm. Preferred qualifications - At least 5 years of software engineering experience. - Experience deploying ML systems or classifiers into production. - Background in trust and safety, integrity, fraud detection, threat intelligence, or adversarial ML. - Experience building developer-facing tooling or platforms that accelerate research workflows. - Familiarity with evaluation methodologies for language models. - Experience with agentic environments. - A history of partnering with researchers and successfully transferring prototypes into production. The annual compensation range for this role is listed below. </p
AI Systems Engineer, Codex Agents
AI Systems Engineer - Codex Core Agents About The Team The Codex Core Agents team builds the agent harness that turns model capability into real-world action. We own the systems around the model: prompting and interpreting model outputs, executing actions safely in real environments, and feeding production experience back into better models and better agent behavior. This team sits close to research and works across the stack: harness, model interaction, inference, sandboxed execution, orchestration, evals, production reliability, and the performance envelope around tokens, latency, cost, capacity, and quality. The harness is open source and increasingly part of how models are trained and evaluated, making this one of the highest-leverage layers in Codex. About The Role We’re looking for engineers to build the AI systems that make Codex agents dependable in production. The ideal candidate is an agent-systems builder: hands-on across low-level systems and ML workflows, able to debug Codex behavior end to end across the harness, model behavior, inference/runtime stack, GPU fleet, and product surface. You’ll work with research, infrastructure, and product to design agent harness capabilities, run experiments and ablations across the model + system prompt + harness stack, build frameworks for assessing production agent performance, and turn messy failures into durable improvements. What You’ll Do - Design and build the core agent harness and execution loop that lets Codex agents interpret model outputs, use tools, execute code, and complete long-horizon tasks safely. - Build sandboxing, isolation, orchestration, state, and workflow infrastructure for agents operating in real development environments. - Develop evaluation, experimentation, and debugging systems that distinguish harness issues, model behavior, inference/runtime issues, and product failures. - Run ablations across prompts, model-facing interfaces, context construction, tool-use strategies, and harness behavior to improve solve rate, reliability, latency, and cost. - Improve observability, profiling, and diagnostics across the agent stack, from backend systems to inference, GPUs, and fleet capacity. - Work closely with research to make the harness trainable, measurable, and useful for improving frontier agentic models. - Build shared primitives that make Codex faster, safer, more reliable, and easier for other teams and open-source users to build on. You Might Be A Good Fit If You - Have built or operated production systems in distributed systems, infrastructure, developer tooling, sandboxing, virtualization, cloud platforms, or ML systems. - Enjoy working across layers: Rust systems code, Python configuration layers, APIs, agent orchestration, evals, logs/traces, inference behavior, runtime constraints, and user outcomes. - Have hands-on experience with LLM applications, coding agents, evals, model deployment, inference, compiler/runtime performance, or developer platforms. - Care deeply about reliability, safety, performance, debuggability, and clean abstractions. - Can debug from evidence and move quickly from ambiguous production failures to practical, durable fixes. - Want to work close to research while still shipping changes to production - Still write meaningful code, show strong ownership, and can lead scoped or multi-team AI systems work. Bonus Points - Deep Rust, systems, sandboxing, isolation, or low-level platform experience. - Experience with coding agents, agent harnesses, tool-using LLM systems, model evals, or post-training feedback loops. - Background in compilers, kernels, runtimes, inference optimization, GPU systems, benchmarking, profiling, or performance engineering. - Experience building production infrastructure used by many engineers or users under demanding reliability and security constraints. - Open-source infrastructure or developer-platform work with strong taste for APIs and usability. About OpenAI OpenAI is an AI research and deploymen
Director, Enterprise Machine Learning & Research
The Enterprise ML team works on the front lines of the AI revolution, partnering deeply with customers to identify high-impact business problems and build cutting-edge AI systems using Scale’s proprietary research, data, and infrastructure—unlocking domain expertise through high-quality data and expert feedback. As Director of Enterprise ML, you will lead a world-class team of research scientists and engineers, define the research roadmap, and drive execution from early prototyping to deployment. You’ll thrive in a fast-moving environment, balancing deep technical leadership with people management, vision setting, and delivery. This role is ideal for a leader who thrives in ambiguity, understands both frontier GenAI capabilities and their limitations, and is motivated by turning research into durable, production-ready systems. What You’ll Do - Lead, mentor and grow a team of research scientists and engineers working on GenAI research initiatives (e.g., evaluation, post-training, agents, RL environments). - Define and drive a multi-year research roadmap: identify key scientific questions, set milestones, allocate resources, and ensure rigorous execution. - Collaborate cross-functionally with engineering, product, client-facing teams and external academic or industry partners to translate research into components, insights, and actionable outcomes. - Communicate compellingly: publish research, present at conferences, engage in open-source contributions, and represent the team externally. - Drive an inclusive, high-performing culture: help your team through technical challenges, provide growth opportunities, and attract top talent. - Stay deeply connected to the research community, understanding major trends, and helping set them. - Thrive in a high-energy, fast-paced startup environment and are ready to dedicate the time and effort needed to drive impactful results. What We’re Looking For Core Qualifications - 8+ years of hands-on research experience (PhD or equivalent preferred) in machine learning, deep learning, generative models, agent/rl systems or related domains. - A strong track record of research excellence, including publications in top-tier ML/AI venues (NeurIPS, ICML, ICLR, ACL, etc.). - Experience and track of recording in landing major research impacts in a fast-paced environment - Experience leading or managing research teams. You’re excited to mentor, coach and develop talent. - Excellent written and verbal communication skills. You are able to articulate research ideas and outcomes to both technical and non-technical stakeholders. - Exceptional communication and stakeholder management skills, with the ability to influence executives, customers, and cross-functional partners Nice to Have - Hands-on experience building and deploying agent-based, tool-augmented, or workflow-driven LLM systems in enterprise environments - Prior ownership of enterprise AI platforms, internal ML products, or customer-facing AI services at scale - Proven track record of partnering directly with enterprises to identify high-impact use cases and deliver measurable business outcomes Compensation packages at Scale for eligible roles in
Staff Infrastructure Software Engineer, Enterprise AI
Scale GP is building the infrastructure that makes enterprise AI seamless. We are looking for a Senior or Staff Infrastructure Engineer to act as a primary technical lead, engineering the 'paved road' for our knowledge retrieval and inference engines. You won't just be managing resources; you’ll be defining the deployment standards for Agentic workflows at scale. Your mission is to bridge the gap between complex AI orchestration and world-class infrastructure, ensuring our platform remains the most reliable destination for enterprise agents The ideal candidate thrives in a fast-paced environment, has a passion for both deep technical work and mentoring, and is capable of setting a long-term technical strategy for a critical domain while maintaining a strong, hands-on delivery focus. You will architect and implement solutions across multiple cloud providers (GCP, Azure, AWS) for customers in diverse, highly-regulated industries like healthcare, telecom, finance, and retail. What You’ll Do: - Architect multi-cloud systems and abstractions to allow the SGP platform to run on top of existing Cloud providers. - Use our own data and AI platform to analyze build and test logs and metrics to identify areas for improvement. - Define the architectural patterns for our multi-cloud infrastructure to support secure, reliable, and scalable Agentic workflows for enterprise customers. - Enhance engineering and infrastructure efficiency, reliability, accuracy, and response times, including CI/CD processes, test frameworks, data quality assurance, end-to-end reconciliation, and anomaly detection. - Collaborate with platform and product teams to develop and implement innovative infrastructure that scales to meet evolving needs. - Design and champion highly scalable, reliable, and low-latency infrastructure and frameworks for building, orchestrating, and evaluating multi-agent systems at enterprise scale. - Lead the infrastructure roadmap with a strong focus on compliance, privacy, and security standards, including designing change management and data isolation strategies. - Own the development and maintenance of our best-in-class Agentic observability platform (logging, metrics, tracing, and analytics) to proactively ensure system health and enable rapid incident response. - Drive developer efficiency by building automated tooling and championing Infrastructure-as-Code (IaC) paradigms throughout the engineering organization to improve workflows and operational efficiency. What We're Looking For: - Proven experience in a senior role, with 5+ years of full-time software engineering experience. - Deep understanding of modern infrastructure practices, including CI/CD, IaC (e.g., Terraform, Helm Charts), container orchestration (e.g., Kubernetes) and observability platforms (e.g., Datadog, Prometheus, Grafana). - Extensive experience with at least one major cloud provider (AWS, Azure, or GCP). - Strong knowledge of security and compliance in enterprise environments, with a focus on access management, data isolation, and customer-specific VPC setups. - Proficiency in Python or JavaScript/TypeScript, and SQL. - Bonus points: Hands-on experience and a passion for working with Agents, LLMs, vector databases, and other emerging AI technologies. Compensation packages at
Research Engineer / Research Scientist, Tokens
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. You want to build large scale ML systems from the ground up. You care about making safe, steerable, trustworthy systems. As a Research Engineer, you'll touch all parts of our code and infrastructure, whether that's making the cluster more reliable for our big jobs, improving throughput and efficiency, running and designing scientific experiments, or improving our dev tooling. You're excited to write code when you understand the research context and more broadly why it's important. Note: This is an "evergreen" role that we keep open on an ongoing basis. We receive many applications for this position, and you may not hear back from us directly if we do not currently have an open role on any of our teams that matches your skills and experience. We encourage you to apply despite this, as we are continually evaluating for top talent to join our team. You are also welcome to reapply as you gain more experience, but we suggest only reapplying once per year. We may also put up separate, team-specific job postings . In those cases, the teams will give preference to candidates who apply to the team-specific postings, so if you are interested in a specific team please make sure to check for team-specific job postings! You may be a good fit if you: - Have significant software engineering experience - Are results-oriented, with a bias towards flexibility and impact - Pick up slack, even if it goes outside your job description - Enjoy pair programming (we love to pair!) - Want to learn more about machine learning research - Care about the societal impacts of your work Strong candidates may also have experience with: - High performance, large-scale ML systems - GPUs, Kubernetes, Pytorch, or OS internals - Language modeling with transformers - Reinforcement learning - Large-scale ETL Representative projects: - Optimizing the throughput of a new attention mechanism - Comparing the compute efficiency of two Transformer variants - Making a Wikipedia dataset in a format models can easily consume - Scaling a distributed training job to thousands of GPUs - Writing a design doc for fault tolerance strategies - Creating an interactive visualization of attention between tokens in a language model The annual compensation range for this role is listed below. For sales roles, the range provided is the role’s On Target Earnings ("OTE") range, meaning that the range includes both the
Research Engineer, Production Model Post-Training
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role Anthropic's production models undergo sophisticated post-training processes to enhance their capabilities, alignment, and safety. As a Research Engineer on our Post-Training team, you'll train our base models through the complete post-training stack to deliver the production Claude models that users interact with. You'll work at the intersection of cutting-edge research and production engineering, implementing, scaling, and improving post-training techniques like Constitutional AI, RLHF, and other alignment methodologies. Your work will directly impact the quality, safety, and capabilities of our production models. Note: For this role, we conduct all interviews in Python. This role may require responding to incidents on short-notice, including on weekends. Responsibilities: - Implement and optimize post-training techniques at scale on frontier models - Conduct research to develop and optimize post-training recipes that directly improve production model quality - Design, build, and run robust, efficient pipelines for model fine-tuning and evaluation - Develop tools to measure and improve model performance across various dimensions - Collaborate with research teams to translate emerging techniques into production-ready implementations - Debug complex issues in training pipelines and model behavior - Help establish best practices for reliable, reproducible model post-training You may be a good fit if you: - Thrive in controlled chaos and are energised, rather than overwhelmed, when juggling multiple urgent priorities - Adapt quickly to changing priorities - Maintain clarity when debugging complex, time-sensitive issues - Have strong software engineering skills with experience building complex ML systems - Are comfortable working with large-scale distributed systems and high-performance computing - Have experience with training, fine-tuning, or evaluating large language models - Can balance research exploration with engineering rigor and operational reliability - Are adept at analyzing and debugging model training processes - Enjoy collaborating across research and engineering disciplines - Can navigate ambiguity and make progress in fast-moving research environments Strong candidates may also: - Have experience with LLMs - Have a keen interest in AI safety and responsible deployment We welcome candidates at various
Research Engineer / Machine Learning Engineer - Applied Voice
About the Team OpenAI is at the forefront of artificial intelligence, driving innovation and shaping the future with cutting-edge research. Our mission is to ensure that AI's benefits reach everyone. We are looking for visionary Research Engineers to join our Applied Voice Team, where you'll conduct groundbreaking research on speech models and transform it into real-world applications that can change industries, enhance human creativity, and solve complex problems. About the Role As a Research Engineer in OpenAI's Applied Voice Team, you will have the opportunity to work with some of the brightest minds in AI. You'll design and build state-of-the-art speech models (speech-to-speech, transcribing, text to speech, etc.) and help turn research breakthroughs into tangible into tangible OpenAI speech products. If you're excited about making AI technology accessible and impactful, this role is your chance to make a significant mark. Some of our recent work: - Introducing gpt-realtime https://openai.com/index/introducing-gpt-realtime/ - Demo - gpt-realtime-1.5 https://x.com/OpenAIDevs/status/2026014334787461508 - ASR, TTS https://x.com/OpenAIDevs/status/2000678814628958502 - May 2026 - Advancing voice intelligence with new models in the API https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/ In this role, you will: - Innovate and Build: Design and build advanced machine learning models that solve real-world problems. Bring OpenAI's research from concept to implementation, creating AI-driven applications with a direct impact. - Collaborate with the Best: Work closely with software engineers, product managers and forward deployed engineers to understand complex business challenges, address customer concerns and deliver AI-powered solutions. Be part of a dynamic team where ideas flow freely and creativity thrives. - Optimize and Scale: Implement scalable data pipelines, optimize models for performance and accuracy, and ensure they are production-ready. Contribute to projects that require cutting-edge technology and innovative approaches. - Learn and Lead: Stay ahead of the curve by engaging with the latest developments in machine learning and AI. Take part in code reviews, share knowledge, and lead by example to maintain high-quality engineering practices. - Make a Difference: Monitor and maintain deployed models to ensure they continue delivering value. Your work will directly influence how AI benefits individuals, businesses, and society at large. You might thrive in this role if you: - Master's/ PhD degree in Computer Science, Machine Learning, or a related field. - 2+ years of professional engineering experience (excluding internships) in relevant roles at tech and product-driven companies. - Demonstrated experience in deep learning and transformers models - Proficiency in frameworks like PyTorch or Tensorflow - Strong foundation in data structures, algorithms, and software engineering principles. - Are familiar with methods of training and fine-tuning large language models, such as distillation, supervised fine-tuning, and policy optimization - Experience with speech models is a plus. - Excellent problem-solving and analytical skills, with a proactive approach to challenges. - Ability to work collaboratively with cross-functional teams. - Ability to move fast in an environment where things are sometimes loosely defined and may have competing priorities or deadlines - Enjoy owning the problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done. About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and val
Research Engineer, Frontier Evals & Environments
About the team The Agent Post-Training team creates the frontier agents OpenAI ships to the world. We are training the models behind our agents in Codex, ChatGPT, the API, and other frontier products: persistent, proactive intelligence that can operate computers, collaborate with people and other agents, and expand what people and organizations can imagine, attempt, and achieve. We define what the next generation of agents should be able to do, build the training signal that teaches those abilities, and run the experiments that make them real. Our work spans coding, tool use, computer use, multi-agent coordination, long-horizon execution, factuality, instruction following, calibrated reasoning, and taste. Our team is where new model capabilities get made. We build the data, environments, graders, training methods, and feedback loops that shape what OpenAI's next agents can do, then carry those capabilities through major training runs and into the products people use. About the Role As a researcher working on Frontier Evals & Environments, you will help build north star model environments to drive progress towards safe AGI/ASI. Your work will directly guide the research programs of the most ambitious training runs happening at OpenAI. Some prior open-sourced evaluations built by researchers in this role include GDPval https://openai.com/index/gdpval/, SWE-bench Verified https://openai.com/index/introducing-swe-bench-verified/, MLE-bench https://openai.com/index/mle-bench/, PaperBench https://openai.com/index/paperbench/, and SWE-Lancer https://openai.com/index/swe-lancer/. If you are interested in feeling firsthand the fast progress of our models, and steering them towards good outcomes, this is the role for you. You will work with researchers, engineers, product teams, infrastructure teams, and safety/alignment partners to decide what should go into major model runs, measure whether it worked, and ship improvements into products used by real people. This is a high-agency role for people who want their work to land directly in frontier models. In this role, you'll: - Create ambitious RL environments to push our models to their limits, and measure frontier model capabilities, skills, and behaviors - Develop new methodologies for automatically exploring the behavior of these models - Dive deep into the science of measurement, including understanding scalability, reliability, and variance of our evaluation methodology - Help steer training for our largest training runs, and see the future first - Design scalable systems and processes to support continuous evaluation - Build self-improvement loops to automate model understanding You might thrive in this role if you - Have strong technical fundamentals in machine learning, software engineering, systems, statistics, or a related field, and can learn quickly across the parts you have not worked in before. - Have hands-on experience with LLMs, RL, RLHF/RLAIF, post-training, evals, graders, synthetic data, model training, coding agents, tool-using agents, or production ML systems. - Are excited by open-ended problems where the path is unclear, the signal is noisy, and the right answer requires both research taste and engineering execution. - Care about product impact and model behavior, not just benchmark movement. You have opinions about what makes an agent useful, reliable, honest, tasteful, and easy to work with. - Can move from a vague behavioral problem to a concrete experiment: define the hypothesis, build the pipeline, run the model, analyze the result, and decide what to do next. - Are comfortable working across research, product, infrastructure, data, evals, and safety boundaries, and can communicate clearly with each group. - Like building load-bearing systems and processes when that is what the team needs, even if the work is not glamorous. - Want to train and ship the models that make agents genuinely useful for developers, enterprises, researchers, and everyday users. About
Education Engineer
ABOUT US At LangChain, our mission is to make intelligent agents ubiquitous. We build the foundation for agent engineering in the real world, helping developers move from prototypes to production-ready AI agents that teams can rely on. We began as widely adopted open-source tools and have grown to also offer a platform for building, evaluating, deploying, and operating agents at scale. With $125M raised at Series B from IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we’re at a stage where we’re continuing to develop new products, growth is accelerating, and all team members have meaningful impact on what we build and how we work together. LangChain is a place where your contributions can shape how this technology shows up in the real world. Today, our platform includes LangSmith (Observability, Evaluation, Deployment, Fleet, and Sandboxes), our open source frameworks (LangChain, LangGraph, and Deep Agents), and the newly launched LangSmith Engine for autonomous agent improvement. We have 100M+ monthly open source downloads, 6,000+ active LangSmith customers, and 5 of the Fortune 10 use LangSmith in production (+ 35% of the Fortune 500 overall), including teams at Klarna, Clay, Coinbase, Workday, Lyft, Cloudflare, Harvey, Rippling, Vanta, LinkedIn, Monday.com, Nvidia, and Bridgewater. ABOUT THE ROLE LangChain Academy is how developers, engineers, and AI practitioners go from curious to capable on LangChain’s products. We’re looking for an Education Engineer to own the development of curriculum that turns developers into proficient agent builders. This is a high-craft, high-ownership role at the intersection of software engineering and education. You’ll design and build the courses, tutorials, and live workshops that teach the LangChain ecosystem to a community of over 1 million developers — many of whom are building production agents for the first time. The bar is high: developers can tell the difference between content that genuinely helps them build and content that’s just a feature tour. Your job is to create content that consistently clears that bar. You’ll work closely with LangChain’s engineering team to stay at the frontier of what’s possible with LangGraph, LangChain, and the broader ecosystem, and work with the Technical Content Manager to shape curriculum strategy and prioritize what gets built. You’ll also partner with the Education Marketing Lead on course launches, YouTube content, and live events. WHAT YOU’LL DO CURRICULUM & COURSE DEVELOPMENT - Build the courses that define how developers learn to build agents. Design and develop end-to-end curriculum for LangChain Academy - from scoping and structure to hands-on labs and assessments — that take developers from zero to proficient with our products. - Partner with LangChain engineers to stay at the frontier. Work closely with the engineering team to understand what’s new, what’s changing, and what developers need to know. Translate that into curriculum. - Write code that teaches. Build the notebooks, example apps, and working code samples that sit at the core of every course. The code needs to be clean, well-structured, and genuinely instructive. - Build the conceptual foundation developers need to build well, not just build. As coding agents handle more of the implementation, what developers need most is a clear mental model of what’s possible and what good looks like — the right architecture, the right tradeoffs, the right patterns. Craft explanations, diagrams, and frameworks that develop genuine understanding. - Keep curriculum current. Agent engineering is a fast-moving field. Own the ongoing review and revision of existing courses to keep them accurate, relevant, and aligned with the latest LangChain releases and agentic applications. LIVE EDUCATION & COMMUNITY - Represent LangChain at workshops, meetups, and conferences. Design and deliver live technical education at developer events — from intimate hackathons to large conference stages. Be the p
Fullstack Software Engineer, Applied AI
ABOUT US At LangChain, our mission is to make intelligent agents ubiquitous. We build the foundation for agent engineering in the real world, helping developers move from prototypes to production-ready AI agents that teams can rely on. We began as widely adopted open-source tools and have grown to also offer a platform for building, evaluating, deploying, and operating agents at scale. With $125M raised at Series B from IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we’re at a stage where we’re continuing to develop new products, growth is accelerating, and all team members have meaningful impact on what we build and how we work together. LangChain is a place where your contributions can shape how this technology shows up in the real world. Today, our platform includes LangSmith (Observability, Evaluation, Deployment, Fleet, and Sandboxes), our open source frameworks (LangChain, LangGraph, and Deep Agents), and the newly launched LangSmith Engine for autonomous agent improvement. We have 100M+ monthly open source downloads, 6,000+ active LangSmith customers, and 5 of the Fortune 10 use LangSmith in production (+ 35% of the Fortune 500 overall), including teams at Klarna, Clay, Coinbase, Workday, Lyft, Cloudflare, Harvey, Rippling, Vanta, LinkedIn, Monday.com, Nvidia, and Bridgewater. ABOUT THE TEAM The Applied AI team builds the agents that show the world what's possible with LangChain. We ship open source reference agents like Open SWE, Open Canvas, and our Deep Research agent that developers across the community use as starting points for their own production systems, while also building internal agents that power LangChain's own GTM and engineering workflows. It's a small, fast-moving team that operates at the frontier, iterating rapidly, running rigorous evals on our own work, and feeding hard-won learnings back into the platform. If you want to work on the frontier of agent-building, this may be the team for you. ABOUT THE ROLE We’re hiring fullstack Applied AI Engineers to help us build AI agents that power every part of LangChain from Marketing and GTM to Recruiting, Support, Internal Tools, and our Core Product. In this role you will own a problem space and work closely with that function to design, build, and deploy production-grade agents, workflows, and applications that transform how we operate. Your work will directly accelerate LangChain’s mission to make intelligent, autonomous software a reality both internally and for our customers. Some of these projects will be open source, contributing to the LangChain and LangGraph ecosystem and setting new standards for how companies build with AI. *This role will be based in our San Francisco or New York office. Employees within commuting distance work from the office are five days per week. Candidates who live outside commuting distance (e.g. >1hr each way), may be eligible for hybrid arrangements depending on location and role requirements. WHAT YOU WILL DO - Design, implement, and deploy end-to-end AI workflows and agents that solve real problems across multiple business domains. - Develop and iterate on agent architectures, evaluation pipelines, and performance frameworks to ensure reliability and measurable outcomes. - Translate emerging AI research and tooling into practical, production-ready solutions. - Communicate technical decisions, trade-offs, and insights clearly to both technical and non-technical stakeholders. - Collaborate cross-functionally embedding with teams like Marketing, GTM, Recruiting, or Product to identify opportunities for agent-driven automation and measurable business impact. - Contribute to the LangChain and LangGraph ecosystem, including open source components, documentation, and shared tools. WHAT YOU WILL BRING - Experienced software engineer with a strong track record shipping AI or ML-powered applications (typically 3+ years, including at least 1 year building LLM systems in production). - Hands-on experience implementing ev
Research Engineer / Scientist, Societal Impacts
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Role As a Research Engineer / Scientist on the Societal Impacts team, you'll design and build critical infrastructure that enables and accelerates foundational research into how our AI systems impact people and society. Your work will directly contribute to our research publications, policy campaigns, safety systems, and products. Our team combines rigorous empirical methods with creative technical approaches. We’re currently grappling with big questions on how AI might impact the future of work , people's wellbeing , education , and more. Additionally, we are continuously studying socio-technical alignment (what values do our systems have?), and evaluating novel AI capabilities as they arise. We develop privacy-preserving tools to measure AI's effects at scale, conduct mixed-methods studies of human-AI interaction, and translate research insights into actionable recommendations for both product and policy. You can learn more about our team here Strong candidates will have a track record of running & designing experiments relating to machine learning systems, building data processing pipelines, architecting & implementing high-quality internal infrastructure, working in a fast-paced startup environment, navigating the ambiguity inherent to novel empirical research, and demonstrating an eagerness to develop their own research & technical skills. The ideal candidate will enjoy a mixture of running experiments, developing new tools & evaluation suites, working cross-functionally across multiple research and product teams, and striving for beneficial & safe uses for AI. Responsibilities: - Design and implement scalable technical infrastructure that enables researchers to efficiently run experiments and evaluate AI systems. - Architect systems that can handle uncertain and changing requirements while maintaining high standards of reliability - Lead technical design discussions to ensure our infrastructure can support both current needs and future research directions - Partner closely with researchers, data scientists, policy experts, and other cross-functional partners to advance Anthropic’s safety mission - Interface with and improve our internal technical infrastructure and tools - Generate net-new insights about the potential societal impact of systems being developed by Anthropic - Ship changes that help improve our models and products based on the empirical research the Societal Impacts team is conducting </li
Machine Learning Research Scientist, Reasoning
About Scale At Scale AI, our mission is to accelerate the development of AI applications. For 8 years, Scale has been the leading AI data foundry, fueling the most exciting advancements in AI, including generative AI, defense applications, and autonomous vehicles. With our recent Series F round, we’re amplifying access to high-quality data to drive progress toward Artificial General Intelligence (AGI). Building on our history of model evaluation with enterprise and government customers, we are expanding our capabilities to set new standards for both public and private evaluations. About This Role This role operates at the forefront of AI research and real-world implementation, with a strong focus on reasoning within large language models (LLMs). The ideal candidate will study the data types critical for advancing LLM-based agents, including browser and software engineering (SWE) agents. You will play a key role in shaping Scale’s data strategy by identifying the most effective data sources and methodologies for improving LLM reasoning. Success in this role requires a deep understanding of LLMs, planning algorithms, and novel approaches to agentic reasoning, as well as creativity in tackling challenges related to data generation, model interaction, and evaluation. You will contribute to impactful research on language model reasoning , collaborate with external researchers, and work closely with engineering teams to bring state-of-the-art advancements into scalable, real-world solutions. Ideally, you’d have: - Practical experience working with LLMs, with proficiency in frameworks like PyTorch, JAX, or TensorFlow. You should also be skilled at rapidly interpreting research literature and turning new ideas into working prototypes. - A track record of published research in top ML and NLP venues (e.g., ACL, EMNLP, NAACL, NeurIPS, ICML, ICLR, CoLLM, etc.). - At least three years of experience solving complex ML challenges, either in a research setting or product development, particularly in areas related to LLM capabilities and reasoning. - Strong written and verbal communication skills, along with the ability to work effectively across teams. Nice to have: - Hands-on experience fine-tuning open-source LLMs or leading bespoke LLM fine-tuning projects using PyTorch/JAX. - Research and practical experience in building applications and evaluations related to LLM-based agents, including tool-use, text-to-SQL, browser agents, coding agents, and GUI agents. - Experience with agent frameworks such as OpenHands, Swarm, LangGraph, or similar. - Familiarity with advanced agentic reasoning techniques such as STaR and PLANSEARCH. - Proficiency in cloud-based ML development, with experience in AWS or GCP environments. Our research interviews are designed to assess candidates' ability to prototype and debug ML models, their depth of understanding in research concepts, and their alignment with our organizational culture. We do not conduct LeetCode-style problem-solving assessments. Compensation packages at Scale for eligible roles include base salary, equity, and benef
Model Policy, Chemical & Biological Risk
About the Team Our Safety Systems https://openai.com/safety/safety-systems team is at the forefront of OpenAI's mission to build and deploy safe AGI, driving our commitment to AI safety and fostering a culture of trust and transparency. The Model Policy team aligns model behavior with desired human values and norms. We co-design policy with models and for models by driving rapid policy taxonomy iteration based on data and defining evaluation criteria for foundational models’ ability to reason about safety. Key focus areas include: catastrophic risk, mental health, teen safety and multimodal safety. About the Role Providing access to frontier AI systems raises complex questions around dual-use science and catastrophic risk. How should models respond to requests involving chemical synthesis, biological experimentation, or pathogen research? Where is the boundary between legitimate scientific inquiry and information that could enable misuse? How do we design policies that meaningfully reduce risk without unnecessarily restricting beneficial research? This is a senior role in which you’ll help shape policy creation and development at OpenAI for addressing biological and chemical risks. You will develop structured policy frameworks and taxonomies to guide safe model behavior. This role sits at the intersection of biosecurity expertise, AI safety research, and policy design. You will help ensure that frontier AI systems can support beneficial life sciences research, such as drug discovery, public health, and biosafety, while reducing the risk that these capabilities could be misused. Our relevant publications: - Preparedness framework https://openai.com/index/updating-our-preparedness-framework/ - Preparing for future AI capabilities in biology https://openai.com/index/preparing-for-future-ai-capabilities-in-biology/ - Safety evaluations hub https://openai.com/safety/evaluations-hub/ - OpenAI GPT5 System Card https://openai.com/index/gpt-5-system-card/ - Evaluating Fairness in ChatGPT https://openai.com/index/evaluating-fairness-in-chatgpt/ - Improving Model Safety Behavior with Rule-Based Rewards https://openai.com/index/improving-model-safety-behavior-with-rule-based-rewards/ - OpenAI Model Spec https://openai.com/index/introducing-the-model-spec/ Your Responsibilities: - Design and maintain model policies governing chemical and biological risk, defining how models should safely handle dual-use scenarios. - Develop structured taxonomies of chemical and biological risk that inform model training data, evaluation benchmarks, and safety monitoring systems. - Translate biosecurity and chemical security expertise into actionable model behavior, working closely with research and engineering teams to operationalize policy in training and evaluation pipelines. - Develop a broad range of subject matter expertise while maintaining agility across topics. - Identify emerging risk vectors where frontier AI capabilities could meaningfully lower barriers to harmful activity and develop mitigation strategies. - Engage with internal and external subject-matter experts in biosecurity, biodefense, and chemical safety to ensure policies reflect real-world risk landscapes. You might thrive in this role if you: - Have strong domain expertise in chemistry, biology, biosecurity, or related fields and are motivated to translate that expertise into principled, operational policies that scale to frontier AI systems. - Have experience researching or working with LLMs, machine learning, AI governance, technology policy, or related areas, and enjoy tackling structured reasoning and classification problems—such as defining boundaries between legitimate scientific inquiry and potentially harmful applications. - Have experience designing, refining, or enforcing policies or safeguards for complex systems, whether in AI/ML environments, scientific research governance, national security contexts, or other high-stakes technical domains. - Are comfortable navigating a
Machine Learning Research Engineer, Agents - Enterprise GenAI
AI is becoming vitally important in every function of our society. At Scale, our mission is to accelerate the development of AI applications. For 9 years, Scale has been the leading AI data foundry, helping fuel the most exciting advancements in AI, including generative AI, defense applications, and autonomous vehicles. With our recent investment from Meta, we are doubling down on building out state of the art post-training algorithms to reach the performance necessary for complex agents in enterprises around the world. The Enterprise ML Research Lab works on the front lines of this AI revolution. We are working on an arsenal of proprietary research, tools, and resources that serve all of our enterprise clients. As an Agent MLRE, you will be working on applying our Agent RL Training + Building algorithms to real life enterprise datasets across our clients + benchmarks. This will involve creating best-in-class Agents that achieve state of the art results through a combination of post-training + agent-building algorithms. If you are excited about shaping the future of the modern GenAI movement, we would love to hear from you! You will: - Train state of the art models, developed both internally and from the community, to deploy to our enterprise customers. - Research cutting edge algorithms to integrate directly into our training stack. - Build agents that leverage our proprietary agent-building algorithms to automatically hill climb datasets – including defining highly performant tools, multi-agent systems, and complex rewards. Ideally you’d have: - 1-3 years of building with LLMs in a production environment - Experience with post-training methods like RLHF/RLVR and related algorithms like PPO/GRPO etc. - Publications in top conferences such as NEURIPS, ICLR, or ICML within the last two years - PhD or Masters in Computer Science or a related field Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including job-related skills, experience, qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You'll also receive benefits including, but not limited to: comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend. Please reference the job posting's subtitle for where this position will be located. For pay transparency purposes, the base salary range for this full-time position in the locations of San Francisco, New York, Seattle is: $250,000 — $350,000 USD PLEASE NOTE: </strong&
Frontier Agents Engineer
About Scale AI Scale AI is the data foundation for AI, helping organizations build and deploy reliable production AI applications. We partner with leading enterprises and government organizations to accelerate their AI initiatives through our data annotation platform, generative AI solutions, and enterprise AI capabilities. Role Overview As a Forward Deployed AI Engineer on our Enterprise team, you'll be the technical bridge between Scale AI's cutting-edge AI capabilities and our most strategic customers. You'll work with enterprise clients to understand their unique challenges, architect custom AI solutions, and ensure successful deployment and adoption of AI systems in production environments. This is a hands-on technical role that combines deep engineering expertise with customer-facing problem solving. You'll work directly with customer engineering teams to integrate AI into their critical workflows. Key Responsibilities Customer Integration & Deployment - Partner directly with enterprise customers to understand their technical infrastructure, data pipelines, and business requirements - Design and implement custom integrations between Scale AI's platform and customer data environments (cloud platforms, data warehouses, internal APIs) - Build robust data connectors and ETL pipelines to ingest, process, and prepare customer data for AI workflows - Deploy and configure AI models and agents within customer security and compliance boundaries AI Agent Development - Develop production-grade AI agents tailored to customer use cases across domains like customer support, data analysis, content generation, and workflow automation - Architect multi-agent systems that orchestrate between different models, tools, and data sources - Implement evaluation frameworks to measure agent performance and iterate toward business objectives - Design human-in-the-loop workflows and feedback mechanisms for continuous agent improvement Prompt Engineering & Optimization - Create sophisticated prompt engineering strategies optimized for customer-specific domains and data - Build and maintain prompt libraries, templates, and best practices for customer use cases - Conduct systematic prompt experimentation and A/B testing to improve model outputs - Implement RAG (Retrieval Augmented Generation) systems and fine-tuning pipelines where appropriate Technical Leadership & Collaboration - Serve as the primary technical point of contact for strategic enterprise accounts - Collaborate with customer data scientists, ML engineers, and software developers to ensure smooth integration - Provide technical training and knowledge transfer to customer teams - Work closely with Scale's product and engineering teams to translate customer needs into product improvements - Document technical architectures, integration patterns, and best practices Problem Solving & Innovation - Debug complex technical issues across the entire stack, from data pipelines to model outputs - Rapidly prototype solutions to unblock customers and prove out new use cases <li&g
Staff Frontier Agents Engineer
About Scale AI Scale AI is the data foundation for AI, helping organizations build and deploy reliable production AI applications. We partner with leading enterprises and government organizations to accelerate their AI initiatives through our data annotation platform, generative AI solutions, and enterprise AI capabilities. Role Overview As a Staff Forward Deployed AI Engineer on our Enterprise team, you'll be the technical bridge between Scale AI's cutting-edge AI capabilities and our most strategic customers. You'll work with enterprise clients to understand their unique challenges, architect custom AI solutions, and ensure successful deployment and adoption of AI systems in production environments. This is a hands-on technical role that combines deep engineering expertise with customer-facing problem solving. You'll work directly with customer engineering teams to integrate AI into their critical workflows. Key Responsibilities Customer Integration & Deployment - Partner directly with enterprise customers to understand their technical infrastructure, data pipelines, and business requirements - Design and implement custom integrations between Scale AI's platform and customer data environments (cloud platforms, data warehouses, internal APIs) - Build robust data connectors and ETL pipelines to ingest, process, and prepare customer data for AI workflows - Deploy and configure AI models and agents within customer security and compliance boundaries AI Agent Development - Develop production-grade AI agents tailored to customer use cases across domains like customer support, data analysis, content generation, and workflow automation - Architect multi-agent systems that orchestrate between different models, tools, and data sources - Implement evaluation frameworks to measure agent performance and iterate toward business objectives - Design human-in-the-loop workflows and feedback mechanisms for continuous agent improvement Prompt Engineering & Optimization - Create sophisticated prompt engineering strategies optimized for customer-specific domains and data - Build and maintain prompt libraries, templates, and best practices for customer use cases - Conduct systematic prompt experimentation and A/B testing to improve model outputs - Implement RAG (Retrieval Augmented Generation) systems and fine-tuning pipelines where appropriate Technical Leadership & Collaboration - Serve as the primary technical point of contact for strategic enterprise accounts - Collaborate with customer data scientists, ML engineers, and software developers to ensure smooth integration - Provide technical training and knowledge transfer to customer teams - Work closely with Scale's product and engineering teams to translate customer needs into product improvements - Document technical architectures, integration patterns, and best practices Problem Solving & Innovation - Debug complex technical issues across the entire stack, from data pipelines to model outputs - Rapidly prototype solutions to unblock customers and prove out new use cases &l
Senior / Staff Machine Learning Research Scientist, Agents
About Scale At Scale AI, our mission is to accelerate the development of AI applications. For 8 years, Scale has been the leading AI data foundry, helping fuel the most exciting advancements in AI, including: generative AI, defense applications, and autonomous vehicles. With our recent Series F round, we’re accelerating the abundance of frontier data to pave the road to Artificial General Intelligence (AGI), and building upon our prior model evaluation work with enterprise customers and governments, to deepen our capabilities and offerings for both public and private evaluations. About the ACE team The Agent Capabilities & Environments (ACE) team, part of Scale’s Research organization, brings together customer-facing Researchers and Applied AI Engineers. Our core mission includes research on agent environments and RL reward signals, benchmarking autonomous agent performance across real-world scenarios and environments, creating robust data programs to improve Large Language Models (LLMs) agentic capabilities and building foundational tools and frameworks for evaluating models as agents. ACE focuses on autonomous agents that dynamically interact with diverse external environments, including code repositories, GUI interfaces, browsers, and more. About This Role This role is at the intersection of cutting-edge AI research and practical application, with a focus on studying the data types essential for building state-of-the-art agents, such as browser and SWE agents. The ideal candidate will explore the data landscape needed to advance intelligent, adaptable AI agents, guiding the data strategy at Scale to drive innovation. This position requires not only expertise in LLM agents and planning algorithms but also creativity in addressing novel challenges related to data, interaction, and evaluation. You will contribute to impactful research publications on agents, collaborate with customer researchers, and work alongside the engineering team to translate these advancements into real-world, scalable solutions. Ideally you’d have: - Practical experience working with LLMs, with proficiency in frameworks like Pytorch, Jax, or Tensorflow. You should also be adept at interpreting research literature and quickly turning new ideas into prototypes. - A track record of published research in top ML venues (e.g., ACL, EMNLP, NAACL, NeurIPS, ICML, ICLR, COLM, etc.) - At least three years of experience addressing sophisticated ML problems, either in a research setting or product development. - Strong written and verbal communication skills and the ability to operate cross-functionally. Nice to have: - Hands-on experience with open source LLM fine-tuning or involvement in bespoke LLM fine-tuning projects using Pytorch/Jax. - Hands-on experience and publications in building applications and evaluations related to AI agents such as tool-use, text2SQL, browser agents, coding agents and GUI agents. - Hands-on experience with agent frameworks such as OpenHands, Swarm, LangGraph, etc. - Familiarity with agentic reasoning methods such as STaR and PLANSEARCH - Experience working with cloud technology stack (eg. AWS or GCP) and developing machine learning models in a cloud environment. Our research interviews are crafted to assess candidates' skills in practical ML prototyping and debugging, their grasp of research concepts, and their alignment with our organizational culture. We will not ask any Lee
Machine Learning Research Scientist, Post-Training
Scale works with the industry’s leading AI labs to provide high quality data and accelerate progress in GenAI research. We are looking for Research Scientists and Research Engineers with expertise in LLM post-training (SFT, RLHF, reward modeling). This role will focus on optimizing data curation and eval to enhance LLM capabilities in both text and multimodal modalities. In this role, you will develop novel methods to improve the alignment and generalization of large-scale generative models. You will collaborate with researchers and engineers to define best practices in data-driven AI development. You will also partner with top foundation model labs to provide both technical and strategic input on the development of the next generation of generative AI models. You will: - Research and develop novel post-training techniques, including SFT, RLHF, and reward modeling, to enhance LLM core capabilities in both text and multimodal modalities. - Design and experiment new approaches to preference optimization. - Analyze model behavior, identify weaknesses, and propose solutions for bias mitigation and model robustness. - Publish research findings in top-tier AI conferences. Ideally you’d have: - Ph.D. or Master's degree in Computer Science, Machine Learning, AI, or a related field. - Deep understanding of deep learning, reinforcement learning, and large-scale model fine-tuning. - Experience with post-training techniques such as RLHF, preference modeling, or instruction tuning. - Excellent written and verbal communication skills - Published research in areas of machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals - Previous experience in a customer facing role. Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including job-related skills, experience, qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You'll also receive benefits including, but not limited to: comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend. Please reference the job posting's subtitle for where this position will be located. For pay transparency purposes, the base salary range for this full-time position in the locations of San Francisco, New York, Seattle is: $252,000 — $315,000 USD PLEASE NOTE: Our policy requires a 90-day waiting period before reconsidering candidates for the same role.
Product Manager of AI Applications, Global Public Sector
Scale is growing rapidly, and joining the Global Public Sector team is an opportunity to work on one of the most exciting and quickly expanding teams at Scale. This team is responsible for generating, executing, and fostering Scale’s work with governments and government-backed entities outside of the United States. We develop bespoke solutions that leverage our customers’ proprietary data and expertise to transform their organizations with AI. We work with them to understand their pain points and workflows and then forward deploy our team to build cutting-edge solutions. The applications we build are powered by the Scale GenAI Platform, a full stack product to build, test and deploy frontier AI agents. - Developing custom AI applications - Building custom LLMs - Providing high-quality training data for research and government institutions building LLMs - Developing partnerships to foster regional talent growth and AI adoption We are looking for an entrepreneurial and experienced product leader to play a pivotal role in the ideation and development of transformative AI solutions. The ideal candidate has deep experience with AI/ML application development, can think strategically about how to solve a problem, is an excellent listener, is comfortable getting into the weeds operationally, and has a strong understanding of software engineering principles and practices. You will be responsible for owning large AI projects for one or many customers. You will lead a cross-functional team of engineers, MLEs, and operators to build a highly impactful solution for our customers that will drive millions in revenue for our business as well. Responsibilities: - Lead design workshops with the client to define custom AI solutions - Scope out new AI application use cases across various government entities - Lead cross-functional development of AI applications and custom LLMs with diverse stakeholders (Engineering + Ops + Go-to-Market) - Consistently engage with future end-users to solicit feedback and ensure we are prioritizing effectively - Stay up to date with latest research in applied AI and training custom LLMs - Scope out model evaluation sets and performance requirements, consistently review results, and iterate on the solution - Give regular progress updates to the client and Global Public Sector leadership Minimum Qualifications: - 4+ years of experience building products with specific experience within the last 1-2 years building AI-powered products - Strong technical background (STEM degree) and/or experience building technical software products - Strong understanding of generative AI technologies and their applications in both enterprise and consumer settings - Experience with vibe coding tools (i.e., Replit, Lovable, Bolt, etc.) and design tools (i.e., Figma/Canva/Miro) - Exceptional leadership, presentation and communication skills with the ability to influence cross-functional teams Nice to haves: - Coding experience (Python) - Proficiency in Arabic, both written and spoken PLEASE NOTE: Our policy requires a 90-day waiting period before reconsidering candidates for the same role. This allows us to ensure a fair and thorough evaluation of all ap
Research Scientist, Safety Post Training
Scale Labs, Research Scientist — Safety Post Training As the leading data and evaluation partner for frontier AI companies, Scale plays an integral role in understanding the capabilities and safeguarding AI models and systems. Building on this expertise, Scale Labs has launched a new team focused on policy research, to bridge the gap between AI research and global policymakers to make informed, scientific decisions about AI risks and capabilities. Our research tackles the hardest problems in agent robustness, AI control protocols, and AI risk evaluations to help governments, industry, and the public understand and mitigate AI risk while maximizing AI adoption. This team collaborates broadly across industry, the public sector, and academia and regularly publishes our findings. We are actively seeking talented researchers to join us in shaping this vision. As a Research Scientist working on Safety Post-Training you will develop and apply post-training methods and interpretability techniques to make frontier AI systems safer, and better understood by researchers and policymakers.. For example, you might: - Design and run post-training pipelines to study how training choices affect model safety, robustness, and alignment properties; - Develop interpretability-informed evaluations that reveal how and why models produce unsafe, deceptive, or otherwise undesirable behaviors, and use those insights to guide targeted mitigations; - Collaborate with policymakers, engineers, and other researchers to translate post-training and interpretability findings into actionable safety standards, evaluation benchmarks, and best practices. Ideally you’d have: - Commitment to our mission of promoting safe, secure, and trustworthy AI deployments in the industry as frontier AI capabilities continue to advance. - Experience with post-training and RL techniques such as RLHF, DPO, GRPO, and similar approaches. - A track record of published research in machine learning, particularly in generative AI. - At least three years of experience addressing sophisticated ML problems, whether in a research setting or in product development. - Strong written and verbal communication skills to operate in a cross-functional team. Nice to have: - Experience with mechanistic interpretability, probing, or other techniques for understanding model internals. - Familiarity with red-teaming or adversarial evaluation of post-trained models. - Experience studying failure modes introduced or masked by post-training, such as reward hacking, sycophancy, or alignment faking. Our research interviews are crafted to assess candidates' skills in practical ML prototyping and debugging, their grasp of research concepts, and their alignment with our organizational culture. We will not ask any LeetCode-style questions. If you’re excited about advancing AI safety and contributing to our mission, we encourage you to apply, even if your experience doesn’t perfectly align with every requirement. Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional facto
Privacy Research Engineer, Safeguards
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Role We are looking for researchers to help mitigate the risks that come with building AI systems. One of these risks is the potential for models to interact with private user data. In this role, you'll design and implement privacy-preserving techniques, audit our current techniques, and set the direction for how Anthropic handles privacy more broadly. Responsibilities: - Lead our privacy analysis of frontier models, carefully auditing the use of data and ensuring safety throughout the process - Develop privacy-first training algorithms and techniques - Develop evaluation and auditing techniques to measure the privacy of training algorithms - Work with a small, senior team of engineers and researchers to enact a forward-looking privacy policy - Advocate on behalf of our users to ensure responsible handling of all data You may be a good fit if you have: - Experience working on privacy-preserving machine learning - A track record of shipping products and features inside a fast-moving environment - Strong coding skills in Python and familiarity with ML frameworks like PyTorch or JAX. - Deep familiarity with large language models, how they work, and how they are trained - Have experience working with privacy-preserving techniques (e.g., differential privacy and how it is different from k-anonymity, l-diversity, and t-closeness) - Experience supporting fast-paced startup engineering teams - Demonstrated success in bringing clarity and ownership to ambiguous technical problems - Proven ability to lead cross-functional security initiatives and navigate complex organizational dynamics Strong candidates may also: - Have published papers on the topic of privacy-preserving ML at top academic venues - Prior experience training large language models (e.g., collecting training datasets, pre-training models, post-training models via fine-tuning and RL, running evaluations on trained models) - Prior experience developing tooling to support privacy-preserving ML (e.g., differential privacy in TF-Privacy or Opacus) The annual compensation range for this role is listed below. For sales roles, the range provided is the role’s On Target Earnings ("OTE") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role. Annual Salary: $320,000 — $485,000 USD Logistics Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience
Machine Learning Systems Research Engineer, Agent Post-training - Enterprise GenAI
AI is becoming vitally important in every function of our society. At Scale, our mission is to accelerate the development of AI applications. For 9 years, Scale has been the leading AI data foundry, helping fuel the most exciting advancements in AI, including generative AI, defense applications, and autonomous vehicles. With our recent investment from Meta, we are doubling down on building out state of the art post-training algorithms to reach the performance necessary for complex agents in enterprises around the world. The Enterprise ML Research Lab works on the front lines of this AI revolution. We are working on an arsenal of proprietary research and resources that serve all of our enterprise clients. As an ML Sys Research Engineer, you’ll work on building out the algorithms for our next-gen Agent RL training platform, support large scale training, and research and integrate state-of-the-art technologies to optimize our ML system. Your customer will be other MLREs and AAIs on the Enterprise AI team who are taking the training algorithms and applying them to client use-cases ranging from next-generation AI cybersecurity firewall LLMs to training foundation healthtech search models. If you are excited about shaping the future of the modern AI movement, we would love to hear from you! You will: - Build, profile and optimize our training and inference framework. - Post-train state of the art models, developed both internally and from the community, to define stable post-training recipes for our enterprise engagements. - Collaborate with ML teams to accelerate their research and development, and enable them to develop the next generation of models and data curation.. - Create a next-gen agent training algorithm for multi-agent/multi-tool rollouts. Ideally you’d have: - At least 1-3 years of LLM training in a production environment - Passionate about system optimization - Experience with post-training methods like RLHF/RLVR and related algorithms like PPO/GRPO etc. - Ability to demonstrate know-how on how to operate the architecture of the modern GPU cluster - Experience with multi-node LLM training and inference - Strong software engineering skills, proficient in frameworks and tools such as CUDA, Pytorch, transformers, flash attention, etc. - Strong written and verbal communication skills to operate in a cross functional team environment. - PhD or Masters in Computer Science or a related field Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including job-related skills, experience, qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You'll also receive benefits including, but not limited to: comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligibl
Senior Frontier Agents Engineer
About Scale AI Scale AI is the data foundation for AI, helping organizations build and deploy reliable production AI applications. We partner with leading enterprises and government organizations to accelerate their AI initiatives through our data annotation platform, generative AI solutions, and enterprise AI capabilities. Role Overview As a Senior Forward Deployed AI Engineer on our Enterprise team, you'll be the technical bridge between Scale AI's cutting-edge AI capabilities and our most strategic customers. You'll work with enterprise clients to understand their unique challenges, architect custom AI solutions, and ensure successful deployment and adoption of AI systems in production environments. This is a hands-on technical role that combines deep engineering expertise with customer-facing problem solving. You'll work directly with customer engineering teams to integrate AI into their critical workflows. Key Responsibilities Customer Integration & Deployment - Partner directly with enterprise customers to understand their technical infrastructure, data pipelines, and business requirements - Design and implement custom integrations between Scale AI's platform and customer data environments (cloud platforms, data warehouses, internal APIs) - Build robust data connectors and ETL pipelines to ingest, process, and prepare customer data for AI workflows - Deploy and configure AI models and agents within customer security and compliance boundaries AI Agent Development - Develop production-grade AI agents tailored to customer use cases across domains like customer support, data analysis, content generation, and workflow automation - Architect multi-agent systems that orchestrate between different models, tools, and data sources - Implement evaluation frameworks to measure agent performance and iterate toward business objectives - Design human-in-the-loop workflows and feedback mechanisms for continuous agent improvement Prompt Engineering & Optimization - Create sophisticated prompt engineering strategies optimized for customer-specific domains and data - Build and maintain prompt libraries, templates, and best practices for customer use cases - Conduct systematic prompt experimentation and A/B testing to improve model outputs - Implement RAG (Retrieval Augmented Generation) systems and fine-tuning pipelines where appropriate Technical Leadership & Collaboration - Serve as the primary technical point of contact for strategic enterprise accounts - Collaborate with customer data scientists, ML engineers, and software developers to ensure smooth integration - Provide technical training and knowledge transfer to customer teams - Work closely with Scale's product and engineering teams to translate customer needs into product improvements - Document technical architectures, integration patterns, and best practices Problem Solving & Innovation - Debug complex technical issues across the entire stack, from data pipelines to model outputs - Rapidly prototype solutions to unblock customers and prove out new use cases &
Senior Machine Learning Engineer - Model Evaluations, Public Sector
Senior Machine Learning Engineer - Model Evaluations, Public Sector The Public Sector ML team at Scale deploys advanced AI systems—including LLMs, agentic models, and multimodal pipelines—into mission-critical government environments. We build evaluation frameworks that ensure these models operate reliably, safely, and effectively under real-world constraints. As an ML Engineer, you will design, implement, and scale automated evaluation pipelines that help customers trust and operationalize advanced AI systems across defense, intelligence, and federal missions. You will: - Develop and maintain automated evaluation pipelines for ML models across functional, performance, robustness, and safety metrics, including LLM-judge–based evaluations. - Design test datasets and benchmarks to measure generalization, bias, explainability, and failure modes. - Build evaluation frameworks for LLM agents, including infrastructure for scenario-based and environment-based testing. - Conduct comparative analyses of model architectures, training procedures, and evaluation outcomes. - Implement tools for continuous monitoring, regression testing, and quality assurance for ML systems. - Design and execute stress tests and red-teaming workflows to uncover vulnerabilities and edge cases. - Collaborate with operations teams and subject matter experts to produce high-quality evaluation datasets. - Comfortable with light travel (approximately 10%) for customer interaction and team needs. This role will require an active security clearance or the ability to obtain a security clearance. Ideally you’d have: - Experience in computer vision, deep learning, reinforcement learning, or NLP in production settings. - Strong programming skills in Python; experience with TensorFlow or PyTorch. - Background in algorithms, data structures, and object-oriented programming. - Experience with LLM pipelines, simulation environments, or automated evaluation systems. - Ability to convert research insights into measurable evaluation criteria. Nice to haves: - Graduate degree in CS, ML, or AI. - Cloud experience (AWS, GCP) and model deployment experience. - Experience with LLM evaluation, CV robustness, or RL validation. - Knowledge of interpretability, adversarial robustness, or AI safety frameworks. - Familiarity with ML evaluation frameworks and agentic model design. - Experience in regulated, classified, or mission-critical ML domains. Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including job-related skills, experience, qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be elig
Machine Learning Fellow - Human Frontier Collective (US)
PLEASE NOTE: This is a fully remote, 1099 independent contractor opportunity with an estimated duration of six months and the potential for extension. To be eligible, candidates must be authorized to work in the United States; visa sponsorship is not available for this role. About the Program The Human Frontier Collective (HFC) Fellowship brings together top researchers and domain experts to collaborate on high-impact work that are shaping the future of AI. As an HFC Fellow, you’ll apply your academic and professional expertise to help design, evaluate, and interpret advanced generative AI systems—while gaining exposure to cutting-edge research and working alongside an interdisciplinary network of leading thinkers. What You'll Do - ML Projects: Get invited to engage in high-impact projects with our partnered AI labs and platforms. Help models understand real-world deep learning workflows by designing, reviewing, and optimizing PyTorch models, evaluating complex ML code and AI-generated implementations for efficiency and correctness, and advising on GPU optimization, scaling, and trade-offs. - HFC Community: Beyond the work, you’ll become part of a supportive, interdisciplinary network of innovators and thought leaders committed to advancing frontier AI across domains. - Contribute to Research Publications: Collaborate with Scale’s research team to co-author technical reports and research papers—boosting your academic visibility and professional recognition (e.g., SciPredict , PropensityBench , Professional Reasoning Benchmark ). Who Should Apply - Education: PhD or postdoctoral degree in Computer Science, Computer Engineering, or a related field. - Professional Background: 1-3+ years of experience as a Machine Learning Engineer or Data Scientist. - Skills: Strong proficiency in Python and modern ML frameworks (PyTorch, TensorFlow). Experience with cloud infrastructure (AWS) and MLOps tools (Docker, Langchain) is a plus. - Professional Mindset: Detail-oriented, innovative thinker with a passion in applied AI research and a commitment to collaboration. Why Join the HFC? - Professional Development: High-impact experts expand their influence through review projects, advisory roles, and research, while deepening their AI expertise, strengthening analytical and problem-solving skills, and engaging with pioneering AI applications in science and technology. - Join a Top-Tier Network: Collaborate with a global network of engineers and experts to advance responsible AI through impactful, flexible research and training. 80% of our members come from leading institutions. - Flexible Schedule: Set your own schedule, with flexible 10–40 hour weeks that fit around your life and other commitments. - Competitive Pay: Project pay rates vary across platforms and are depending on a number of factors, including but not li
Research Engineer/Research Scientist, Audio
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. Anthropic’s Audio team pushes the boundaries of what's possible with audio with large language models. We care about making safe, steerable, reliable systems that can understand and generate speech and audio, prioritizing not only naturalness but also steerability and robustness. As a researcher on the Audio team, you'll work across the full stack of audio ML, developing audio codecs and representations, sourcing and synthesizing high quality audio data, training large-scale speech language models and large audio diffusion models, and developing novel architectures for incorporating continuous signals into LLMs. Our team focuses primarily but not exclusively on speech, building advanced steerable systems spanning end-to-end conversational systems, speech and audio understanding models, and speech synthesis capabilities. The team works closely with many collaborators across pretraining, finetuning, reinforcement learning, production inference, and product to get advanced audio technologies from early research to high impact real-world deployments. You may be a good fit if you: - Have hands-on experience with training audio models, whether that's conversational speech-to-speech, speech translation, speech recognition, text-to-speech, diarization, codecs, or generative audio models - Genuinely enjoy both research and engineering work, and you'd describe your ideal split as roughly 50/50 rather than heavily weighted toward one or the other - Are comfortable working across abstraction levels, from signal processing fundamentals to large-scale model training and inference optimization - Have deep expertise with JAX, PyTorch, or large-scale distributed training, and can debug performance issues across the full stack - Thrive in fast-moving environments where the most important problem might shift as we learn more about what works - Communicate clearly and collaborate effectively; audio touches many parts of our systems, so you'll work closely with teams across the company - Are passionate about building conversational AI that feels natural, steerable, and safe - Care about the societal impacts of voice AI and want to help shape how these systems are developed responsibly Strong candidates may also have experience with: - Large language model pretraining and finetuning - Training diffusion models for image and audio generation - Reinforcement learning for large language models and diffusion models - End-to-end system optimization, from performance benchmarking to kernel optimization - GPUs, Kubernetes, PyTorch, or distributed training infrastructure Representative projects: - Training state-of-the art neural audio codecs for 48 kHz stereo audio - Developing novel algorithms for diffusion pretraining and reinforcement learning - Scaling audio datasets to millions of hours of high quality audio - Creating robust evaluation methodologies for hard-to-measure qualities such as naturalness or expressivene
Machine Learning Engineer, Global Public Sector
Scale’s mission is to develop reliable AI systems for the world's most important decisions. Our core work consists of: - Creating custom AI applications that will impact millions of citizens - Generating high-quality training data for national LLMs - Upskilling and advisory services to spread the impact of AI Scale is hiring ML Research Engineers to bridge the gap between frontier research and real-world impact. While we solve critical challenges for global governments, your role will extend beyond implementation. You will lead the charge in research into Agent design, Deep Research and AI Safety/reliability, developing novel methodologies that not only power public sector applications but set new standards across the entire Scale organisation. Your mission is threefold: - Frontier Research & Publication: Leading research into LLM/agent capabilities, reasoning, and safety, with the goal of publishing at top-tier venues (NeurIPS, ICML, ICLR). - Cross-Org Impact: Developing generalised techniques in Agent design, AI Safety and Deep Research agents that scale across our commercial and government platforms. - Mission-Critical Applications: Engineering high-stakes AI systems that impact millions of citizens globally. You will: - Pioneer Novel Architectures: Design and train state-of-the-art models and agents, moving beyond “off-the-shelf” solutions to create custom architectures for complex public sector reasoning tasks. - Lead AI Safety Initiatives: Research and implement robust safety frameworks, including red teaming, alignment (RLHF/DPO), and bias mitigation strategies essential for sovereign AI. - Drive Deep Research Capabilities: Develop agents capable of long-horizon reasoning and autonomous information synthesis to solve complex problems for national security and public policy. - Publish and Contribute: Represent Scale in the broader research community by publishing high-impact papers and contributing to open-source breakthroughs. - Consult as a Subject Matter Expert: Act as a technical authority for public sector leaders, advising on the theoretical limits and safety requirements of emerging AI. - Build Evaluation Frontiers: Create new benchmarks and evaluation protocols that define what success looks like for high-stakes, non-commercial AI applications. Ideally, you’d have: - Advanced Degree: PhD or Master’s in Computer Science, Mathematics, or a related field with a focus on Deep Learning. - Research Track Record: A portfolio of first-author publications at major conferences (NeurIPS, ICML, CVPR, EMNLP, etc.). - Engineering Rigour: Strong proficiency in Python, deep learning frameworks (PyTorch/JAX), with the ability to write production-ready code that scales. - Safety Expertise: Experience in alignment, robustness, or interpretability research. Nice to haves: - Experience with large-scale distributed training on massive clusters. - Experience in building agentic systems that are reliable. - Experience in
RE / RS - Foundations, Search
About the Team The Foundations Research team works on high-risk, high-reward ideas that could shape the next decade of AI. Our goal is to advance the science and data that enable our training and scaling efforts, with a particular focus on future frontier models. Pushing the boundaries of data, scaling laws, optimization techniques, model architectures, and efficiency improvements to propel our science. The Search team sits within Foundations, building agentic search by co-designing model–system interfaces with the core search stack (serving, indexing, retrieval) to translate model intent into reliable, real-world actions. Operating at the frontier of AI and information retrieval, the team develops large-scale systems that transform and index vast corpora, enabling models to reason over global knowledge and act dependably. In close partnership with researchers, we rapidly bring modeling breakthroughs into production and redefine how intelligent systems discover, retrieve, and synthesize information at planetary scale. About the Role We’re looking for a researcher focused on our embedding retrieval efforts. You’ll work with a a team of world-class research scientists and engineers developing foundational technology that enables models to retrieve and condition on the right information, at the right time. This includes designing new embedding training objectives, scalable vector store architectures, and dynamic indexing methods. This work will support retrieval across many OpenAI products and internal research efforts, with opportunities for scientific publication and deep technical impact. This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees. Responsibilities - Tackle embedding models and retrieval systems optimized for grounding, relevance, and adaptive reasoning. - Collaborate with a team of researchers and engineers building end-to-end infrastructure for training, evaluating, and integrating embeddings into frontier models. - Drive innovation in dense, sparse, and hybrid representation techniques, metric learning, and learning-to-retrieve systems. - Collaborate closely with Pretraining, Inference, and other Research teams to integrate retrieval throughout the model lifecycle - Contribute to OpenAI’s long-term vision of AI systems with memory and knowledge access capabilities rooted in learned representations. You Might Thrive in This Role If You Have - Proven experience leading high-performance teams of researchers or engineers in ML infrastructure or foundational research. - Deep technical expertise in representation learning, embedding models, or vector retrieval systems. - Familiarity with transformer-based LLMs and how embedding spaces can interact with language model objectives. - Research experience in areas such as contrastive learning, supervised or unsupervised embedding learning, or metric learning. - A track record of building or scaling large machine learning systems, particularly embedding pipelines in production or research contexts. - A first-principles mindset for challenging assumptions about how retrieval and memory should work for large models. About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. F
Staff Machine Learning Research Engineer, Agent Post-training - Enterprise GenAI
AI is becoming vitally important in every function of our society. At Scale, our mission is to accelerate the development of AI applications. For 9 years, Scale has been the leading AI data foundry, helping fuel the most exciting advancements in AI, including generative AI, defense applications, and autonomous vehicles. With our recent investment from Meta, we are doubling down on building out state of the art post-training algorithms to reach the performance necessary for complex agents in enterprises around the world. The Enterprise ML Research Lab works on the front lines of this AI revolution. We are working on an arsenal of proprietary research, tools, and resources that serve all of our enterprise clients. As a Staff Agent Post-Training MLRE, you will build out our next-gen Agent RL training platform. You’ll build out the platform that will train best-in-class Agents that achieve state of the art results on real enterprise use-cases. You’ll integrate cutting edge research into our training stack, enabling MLREs on the Enterprise AI team to deploy use-cases ranging from next-generation AI cybersecurity firewall LLMs to training foundation healthtech search models. If you are excited about shaping the future of the modern GenAI movement, we would love to hear from you! You will: - Train state of the art models, developed both internally and from the community, to deploy to our enterprise customers. - Research cutting edge algorithms to integrate directly into our training stack. - Design solutions that enable complex multi-agent systems to directly learn from both process + outcome based rewards. Ideally you’d have: - 5+ years of LLM training in a production environment - Experience with post-training methods like RLHF/RLVR and related algorithms like PPO/GRPO etc. - Publications in top conferences such as NEURIPS, ICLR, or ICML within the last two years - PhD or Masters in Computer Science or a related field Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including job-related skills, experience, qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You'll also receive benefits including, but not limited to: comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend. Please reference the job posting's subtitle for where this position will be located. For pay transparency purposes, the base salary range for this full-time position in the locations of San Francisco, New York, Seattle is: $250,000 — $350,000 USD <div class="content-conclusion">
Technical Lead Manager, Physical AI
Scale AI is the data engine for the entire AI industry. Our mission is to accelerate the development of AI applications by providing organizations with the high-quality data they need. The Physical AI team at Scale is focused on the next frontier: building general AI that can reason and act in the physical world. By leveraging Scale’s massive data infrastructure, we are helping frontier labs build Foundation Models for Physical AI that will redefine the future of automation. Role Overview As the Technical Lead Manager (TLM) for the Physical AI team of Scale , you will bridge the gap between cutting-edge Machine Learning research and physical robot deployment. You will lead a high-performing team of Research Engineers while remaining a hands-on technical contributor (~60% of your time). Your primary focus will be the development and evaluation of Large-Scale Foundation Models (e.g VLAs, World models) that allow robots and AVs to generalize across diverse tasks, environments, and morphologies. Key Responsibilities Technical Leadership & Research - Model Scaling: Direct research into scaling laws for Physical AI, determining how to best utilize massive datasets for pre-training and fine-tuning generalist policies. - VLA and World model development: Develop novel methods for developing and evaluating models, including new Physical AI industry benchmarks - Hands-on Modeling: Actively write code to implement, train and test SOTA architectures. Conduct research on Physical AI data collection, cross-embodiment training, and policy fine-tuning. - Data Strategy: Collaborate with internal labeling teams to design "robotic-native" data pipelines, including the use of VLMs for automated trajectory annotation and data synthesis. - Collaborate closely with customers to drive the industry forward in using Scale data Team Management & Execution - Mentorship: Lead and grow a team of 4-6 elite Physical AI researchers, fostering a culture of high-velocity experimentation and rigorous evaluation. - Paper-to-Product: Translate the latest research from NeurIPS, ICRA, and CVPR into production-ready features for Scale’s Physical AI partners. - Cross-functional Alignment: Work with cross-functional teams (e.g Product and Operations) to bring our research breakthroughs into production. Required Qualifications AI/ML Excellence - Deep Learning Mastery: Expert-level proficiency in PyTorch , with deep knowledge of Transformer architectures , Attention mechanisms , and Self-Supervised Learning . - VLM/VLA Experience: Proven track record of working with Vision-Language Models (e.g., CLIP, PaLM-E) and adapting them for spatial reasoning or embodied tasks. - Generative AI: Experience with Diffusion Models for sequence generation or Generative World Models for predictive
Research Engineer, Pretraining Scaling - London
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Role: Anthropic's ML Performance and Scaling team trains our production pretrained models, work that directly shapes the company's future and our mission to build safe, beneficial AI systems. As a Research Engineer on this team, you'll ensure our frontier models train reliably, efficiently, and at scale. This is demanding, high-impact work that requires both deep technical expertise and a genuine passion for the craft of large-scale ML systems. This role lives at the boundary between research and engineering. You'll work across our entire production training stack: performance optimization, hardware debugging, experimental design, and launch coordination. During launches, the team works in tight lockstep, responding to production issues that can't wait for tomorrow. Responsibilities: - Own critical aspects of our production pretraining pipeline, including model operations, performance optimization, observability, and reliability - Debug and resolve complex issues across the full stack—from hardware errors and networking to training dynamics and evaluation infrastructure - Design and run experiments to improve training efficiency, reduce step time, increase uptime, and enhance model performance - Respond to on-call incidents during model launches, diagnosing problems quickly and coordinating solutions across teams - Build and maintain production logging, monitoring dashboards, and evaluation infrastructure - Add new capabilities to the training codebase, such as long context support or novel architectures - Collaborate closely with teammates across SF and London, as well as with Tokens, Architectures, and Systems teams - Contribute to the team's institutional knowledge by documenting systems, debugging approaches, and lessons learned You May Be a Good Fit If You: - Have hands-on experience training large language models, or deep expertise with JAX, TPU, PyTorch, or large-scale distributed systems - Genuinely enjoy both research and engineering work—you'd describe your ideal split as roughly 50/50 rather than heavily weighted toward one or the other - Are excited about being on-call for production systems, working long days during launches, and solving hard problems under pressure - Thrive when working on whatever is most impactful, even if that changes day-to-day based on what the production model needs - Excel at debugging complex, ambiguous problems across multiple layers of the stack - Communicate clearly and collaborate effectively, especially when coordinating across time zones or during high-stress incidents - Are passionate about the work itself and want to refine your craft as a research engineer - Care about the societal impacts of AI and responsible scaling Strong Candidates May Also Have: - Previous experience training LLM’s or working extensively with JAX/TPU, PyTorch, or other ML frameworks at scale - Contributed to open-source LLM frame
Research Engineer / Scientist, Alignment Science
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role: You want to build and run elegant and thorough machine learning experiments to help us understand and steer the behavior of powerful AI systems. You care about making AI helpful, honest, and harmless, and are interested in the ways that this could be challenging in the context of human-level capabilities. You could describe yourself as both a scientist and an engineer. As a Research Engineer on Alignment Science, you'll contribute to exploratory experimental research on AI safety, with a focus on risks from powerful future systems (like those we would designate as ASL-3 or ASL-4 under our Responsible Scaling Policy ), often in collaboration with other teams including Interpretability, Fine-Tuning, and the Frontier Red Team. Our blog provides an overview of topics that the Alignment Science team is either currently exploring or has previously explored. Our current topics of focus include... - Scalable Oversight: Developing techniques to keep highly capable models helpful and honest, even as they surpass human-level intelligence in various domains. - AI Control: Creating methods to ensure advanced AI systems remain safe and harmless in unfamiliar or adversarial scenarios. - Alignment Stress-testing : Creating model organisms of misalignment to improve our empirical understanding of how alignment failures might arise. - Automated Alignment Research: Building and aligning a system that can speed up & improve alignment research. - Alignment Assessments : Understanding and documenting the highest-stakes and most concerning emerging properties of models through pre-deployment alignment and welfare assessments (see our <span clas
Machine Learning Fellow - Human Frontier Collective (Canada)
PLEASE NOTE: This is a fully remote, 1099 independent contractor opportunity with an estimated duration of six months and the potential for extension. To be eligible, candidates must be authorized to work in Canada. About the Program The Human Frontier Collective (HFC) Fellowship brings together top researchers and domain experts to collaborate on high-impact work that are shaping the future of AI. As an HFC Fellow, you’ll apply your academic and professional expertise to help design, evaluate, and interpret advanced generative AI systems—while gaining exposure to cutting-edge research and working alongside an interdisciplinary network of leading thinkers. What You'll Do - ML Projects: Get invited to engage in high-impact projects with our partnered AI labs and platforms. Help models understand real-world deep learning workflows by designing, reviewing, and optimizing PyTorch models, evaluating complex ML code and AI-generated implementations for efficiency and correctness, and advising on GPU optimization, scaling, and trade-offs. - HFC Community: Beyond the work, you’ll become part of a supportive, interdisciplinary network of innovators and thought leaders committed to advancing frontier AI across domains. - Contribute to Research Publications: Collaborate with Scale’s research team to co-author technical reports and research papers—boosting your academic visibility and professional recognition (e.g., SciPredict , PropensityBench , Professional Reasoning Benchmark ). Who Should Apply - Education: PhD or postdoctoral degree in Computer Science, Computer Engineering, or a related field. - Professional Background: 1-3+ years of experience as a Machine Learning Engineer or Data Scientist. - Skills: Strong proficiency in Python and modern ML frameworks (PyTorch, TensorFlow). Experience with cloud infrastructure (AWS) and MLOps tools (Docker, Langchain) is a plus. - Professional Mindset: Detail-oriented, innovative thinker with a passion in applied AI research and a commitment to collaboration. Why Join the HFC? - Professional Development: High-impact experts expand their influence through review projects, advisory roles, and research, while deepening their AI expertise, strengthening analytical and problem-solving skills, and engaging with pioneering AI applications in science and technology. - Join a Top-Tier Network: Collaborate with a global network of engineers and experts to advance responsible AI through impactful, flexible research and training. 80% of our members come from leading institutions. - Flexible Schedule: Set your own schedule, with flexible 10–40 hour weeks that fit around your life and other commitments. - Competitive Pay: Project pay rates vary across platforms and are depending on a number of factors, including but not limited to; projects, scope, skillset, and location. </li&g
Research Engineer / Scientist, Alignment Science - London
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role: You want to build and run elegant and thorough machine learning experiments to help us understand and steer the behavior of powerful AI systems. You care about making AI helpful, honest, and harmless, and are interested in the ways that this could be challenging in the context of human-level capabilities. You could describe yourself as both a scientist and an engineer. As a Research Engineer on Alignment Science, you'll contribute to exploratory experimental research on AI safety, with a focus on risks from powerful future systems (like those we would designate as ASL-3 or ASL-4 under our Responsible Scaling Policy ), often in collaboration with other teams including Interpretability, Fine-Tuning, and the Frontier Red Team. Our blog provides an overview of topics that the Alignment Science team is either currently exploring or has previously explored. For the London team, we are opportunistically hiring for the following research areas: - AI Control: Creating methods to ensure advanced AI systems remain safe and harmless in unfamiliar or adversarial scenarios. - Alignment Stress-testing : Creating model organisms of misalignment to improve our empirical understanding of how alignment failures might arise. Note: Currently, the team's hub is in San Francisco, so we require all candidates to be based at least 25% in London and travel to San Francisco occasionally. Additionally, we are prioritizing growing our San Francisco teams, so you may not hear back on your application to the London team unless we see an unusually strong fit. For this role, we conduct all interviews in Python. Representative Projects: - Testing the robustness of our safety techniques by training language models to subvert our safety techniques, and seeing how effective they are at subverting our interventions. - Run multi-agent reinforcement learning experiments to test out techniques like AI Debate . <
Prompt Engineer, Agent Prompts & Evals
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Role We’re looking for prompt and context engineers to join our product engineering team to help build AI-first products, features, and evaluations. Your mission will be to bridge the gap between model capabilities and real product experience, working with product teams to build consistent, safe, and beneficial user experiences across all product surfaces. You will be deeply involved in new product feature and model releases at Anthropic, combining engineering expertise with an understanding of frontier AI applications and model quality. You’ll become an expert on Claude’s behavioral quirks and capabilities and apply that knowledge to deliver the best possible user experience across models and domains. You’ll be the first resource for product teams working on Claude’s AI infrastructure: system prompts, tool prompts, skills, and evaluations. This role requires someone who can effectively balance caring deeply about making Claude the best it can be while also supporting a wide variety of concurrent projects and efforts across many product teams. Key Responsibilities - Prompt Engineering Excellence: Design, test, and optimize system prompts and feature-specific prompts that shape Claude’s behavior across consumer and API products. - Evaluation Development: Build and maintain comprehensive evaluation suites that ensure model quality and consistency across product launches and updates. - Cross-functional Collaboration: Partner closely with product teams, research teams, and safeguards to ensure new features meet quality and safety standards. - Model Launch Support: Play a critical role in model releases, ensuring smooth rollouts and catching regressions before they impact users. - Infrastructure Contribution: Help build and improve the frameworks and tools that allow teams to develop and test prompts and features with confidence. - Knowledge Transfer: Mentor product engineers on prompt engineering best practices and help teams build their first evaluations. - Rapid Iteration: Work in a fast-paced environment where model capabilities advance daily, requiring quick adaptation and creative problem-solving. What We’re Looking For Required Qualifications - 5+ years of software engineering experience with Python or similar languages. - Demonstrated experience with LLMs and prompt engineering (through work, research, or significant personal projects). - Strong understanding of evaluation methodologies and metrics for AI systems. - Excellent written and verbal communication skills – you’ll need to explain complex model behaviors to diverse stakeholders. - Ability to manage multiple concurrent projects and prioritize effectively. - Experience with version control, CI/CD, and modern software development practices. <
Research Engineer, Machine Learning (Reinforcement Learning)
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the teams Our Reinforcement Learning teams lead Anthropic's reinforcement learning research and development, playing a critical role in advancing our AI systems. We've contributed to all Claude models, with significant impacts on the autonomy and coding capabilities of Claude Sonnet 4.5 and Opus 4.5. Our work spans several key areas: - Developing systems that enable models to use computers effectively - Advancing code generation through reinforcement learning - Pioneering fundamental RL research for large language models - Building scalable RL infrastructure and training methodologies - Enhancing model reasoning capabilities We collaborate closely with Anthropic's alignment and frontier red teams to ensure our systems are both capable and safe. We partner with the applied production training team to bring research innovations into deployed models, and are dedicated to implement our research at scale. Our Reinforcement Learning teams sit at the intersection of cutting-edge research and engineering excellence, with a deep commitment to building high-quality, scalable systems that push the boundaries of what AI can accomplish. About the Role As a Research Engineer within Reinforcement Learning, you will collaborate with a diverse group of researchers and engineers to advance the capabilities and safety of large language models. This role blends research and engineering responsibilities, requiring you to both implement novel approaches and contribute to the research direction. You'll work on fundamental research in reinforcement learning, creating 'agentic' models via tool use for open-ended tasks such as computer use and autonomous software generation, improving reasoning abilities in areas such as mathematics, and developing prototypes for internal use, productivity, and evaluation. Representative projects: - Architect and optimize core reinforcement learning infrastructure, from clean training abstractions to distributed experiment management across GPU clusters. Help scale our systems to handle increasingly complex research workflows. - Design, implement, and test novel training environments, evaluations, and methodologies for reinforcement learning agents which push the state of the art for the next generation of models. - Drive performance improvements across our stack through profiling, optimization, and benchmarking. Implement efficient caching solutions and debug distributed systems to accelerate both training and evaluation workflows. - Collaborate across research and engineering teams to develop automated testing frameworks, design clean APIs, and build scalable infrastructure that accelerates AI research. You may be a good fit if you: - Are proficient in Python and async/concurrent programming with frameworks like Trio - Have experience with machine learning frameworks (PyTorch, TensorFlow, JAX) - Have industry experience in machine learning research - Can balance research exploration with engineering implementation<
Manager, Machine Learning Research Scientist, GenAI
Scale AI accelerates the development of AI systems by providing the data, infrastructure, and tooling that power the most advanced models in the world. Our teams operate at the intersection of cutting-edge research, large-scale engineering, and real-world deployment, partnering with leading frontier labs, enterprises, and government agencies to push Generative AI into new capabilities and applications. As AI rapidly evolves from static models to dynamic, agentic systems, Scale is building the foundational research, evaluation methodologies, and agent/RL infrastructure that will define this next era. You’ll join a high-impact research organization driving advances in large-language models, post-training, evaluation, and agentic/RL environments, helping shape how next-generation AI is built, measured, and deployed. As a Research Scientist Manager, you will lead a world-class team of research scientists and engineers, define the research roadmap, and drive execution from early prototyping to deployment. You’ll thrive in a fast-moving environment, balancing deep technical leadership with people management, vision setting, and delivery. You will: - Lead, mentor and grow a team of research scientists and engineers working on GenAI research initiatives (e.g., evaluation, post-training, agents, RL environments). - Define and drive a multi-year research roadmap: identify key scientific questions, set milestones, allocate resources, and ensure rigorous execution. - Collaborate cross-functionally with engineering, product, client-facing teams and external academic or industry partners to translate research into components, insights, and actionable outcomes. - Communicate compellingly: publish research, present at conferences, engage in open-source contributions, and represent the team externally. - Drive an inclusive, high-performing culture: help your team through technical challenges, provide growth opportunities, and attract top talent. - Stay deeply connected to the research community, understanding major trends, and helping set them. - Thrive in a high-energy, fast-paced startup environment and are ready to dedicate the time and effort needed to drive impactful results. Ideally you'd have: - 5+ years of hands-on research experience (PhD or equivalent preferred) in machine learning, deep learning, generative models, agent/rl systems or related domains. - A strong track record of research excellence, including publications in top-tier ML/AI venues (NeurIPS, ICML, ICLR, ACL, etc.). - Experience and track of recording in landing major research impacts in a fast-paced environment - Experience leading or managing research teams. You’re excited to mentor, coach and develop talent. - Excellent written and verbal communication skills. You are able to articulate research ideas and outcomes to both technical and non-technical stakeholders. Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including job-related skills, experience, qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are
Research Engineer, Codex
ABOUT THE TEAM The Codex team is responsible for building state-of-the-art AI systems that can write code, reason about software, and act as intelligent agents for developers and non-developers alike. Our mission is to push the frontier of code generation and agentic reasoning, and deploy these capabilities in real-world products such as ChatGPT and the API, as well as in next-generation tools specifically designed for agentic coding. We operate across research, engineering, product, and infrastructure—owning the full lifecycle of experimentation, deployment, and iteration on novel coding capabilities. ABOUT THE ROLE As a member of the Codex team, you will advance the capabilities, performance, and reliability of AI coding models through a combination of research, experimentation, and system optimization. You’ll collaborate with world-class researchers and engineers to develop and deploy systems that help millions of users write better code, faster—while also ensuring these systems are efficient, cost-effective, and production-ready. We’re looking for people who combine deep curiosity, strong technical fundamentals, and a bias toward impact. Whether your strengths lie in ML research, systems engineering, or performance optimization, you’ll play a pivotal role in pushing the state of the art and bringing these advances into the hands of real users. This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees. IN THIS ROLE, YOU MIGHT: - Design and run experiments to improve code generation, reasoning, and agentic behavior in Codex models. - Develop research insights into model training, alignment, and evaluation. - Hunt down and address inefficiencies across the Codex system stack—from agent behavior to LLM inference to container orchestration—and land high-leverage performance improvements. - Build tooling to measure, profile, and optimize system performance at scale. - Work across the stack to prototype new capabilities, debug complex issues, and ship improvements to production. YOU MIGHT THRIVE IN THIS ROLE IF YOU: - Are excited to explore and push the boundaries of large language models, especially in the domain of software reasoning and code generation. - Have strong software engineering skills and enjoy quickly turning ideas into working prototypes. - Think holistically about performance, balancing speed, cost, and user experience. - Bring creativity and rigor to open-ended research problems and thrive in highly iterative, ambiguous environments. - Have experience operating across both ML systems and cloud infrastructure. About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement https://cdn.openai.com/policies/eeo-policy-statement.pdf. Background checks for applicants will be administered in accordance with applicable law, and qualified applicants with arrest or conviction records will be considered for employment consistent with those laws, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act, for US-based candidates. Fo
AI Applications Ops Lead, GPS
Role Overview Scale’s rapidly growing Global Public Sector team is focused on using AI to address critical challenges facing the public sector around the world. Our core work consists of: - Creating custom AI applications that will impact millions of citizens - Generating high-quality training data for national LLMs - Upskilling and advisory services to spread the impact of AI As a Production AI Ops Lead, you will design and develop the production lifecycle of full-stack AI applications, while supporting end-to-end system reliability, real-time inference observability, sovereign data orchestration, high-security software integration, and the resilient cloud infrastructure required for our international government partners. At Scale, we’re not just building AI solutions—we’re enabling the public sector to transform their operations and better serve citizens through cutting-edge technology. If you’re ready to shape the future of AI in the public sector and be a founding member of our team, we’d love to hear from you. You will: - Own the production outcome: Take full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies. - Ensure Full-Stack integrity: Oversee the end-to-end health of the platform, ensuring seamless integration between the AI core and all full-stack components, from APIs to UI, to maintain a responsive and production-ready environment. - Scale the feedback loop: Build automated systems to monitor model performance and data drift across geographically dispersed environments, ensuring the right levels of reliability. - Navigate global compliance: Manage the technical lifecycle within diverse regulatory frameworks. - Incident command: Lead the response for production issues in mission-critical environments, ensuring rapid resolution and building the guardrails to prevent them from happening again. - Bridge the gap: Translate deep technical performance metrics into clear insights for senior international government officials. - Drive product evolution: Partner with our Engineering and ML teams to ensure the lessons learned in the field directly influence the technical architecture and decisions of future use cases. Ideally, you have: - Experience: 6+ years in a high-impact technical role (SRE, FDE or MLOps) with experience in the public sector. - Global perspective: Familiarity with international government security standards and the complexities of deploying sovereign AI. - System architecture proficiency: Proven experience maintaining production-grade applications with a deep understanding of the full request lifecycle-connecting frontend/API layers to the backend and AI core. - Modern AI Stack expertise: Proficiency in coding and the modern AI infrastructure, including Kubernetes, vector databases, agentic development, and LLM observability tools. - Ownership: You treat every production deployment as your own. You race toward solving hard problems before the customer even sees them. - Reliability: You understand that in the public sector, a model failure m
Researcher, Alignment Science
ABOUT THE TEAM The Alignment Science team at OpenAI studies the science of intent alignment: how to train models to understand what users are actually asking for, act faithfully on that intent while respecting safety constraints, verify what they did, and report their limitations honestly. Our work sits alongside broader value alignment efforts, but this team focuses on scalable methods for ensuring instruction-following, honesty, and robustness as models become more capable. We work on both sides of alignment research: producing externally publishable results and integrating promising techniques into the models OpenAI deploys. Recent team research on model confessions studies how models can be trained to honestly report shortcomings after their original answer, including failures involving hallucination, instruction following, scheming, and reward hacking. That work reflects a broader agenda: build scalable and general methods to ensure models follow human intent. The team uses a mix of training and evaluation methods, with a focus on reinforcement learning. We care about rigorous, quantitative research that can translate into safer model behavior. ABOUT THE ROLE As a Research Engineer / Research Scientist on the Alignment team, you will design and run experiments that help increasingly capable models follow user intent, remain calibrated about correctness and risk, and honestly surface their own mistakes. You will work on hands-on model training, evaluation design, and research infrastructure, while helping turn promising alignment methods into techniques that can be used in frontier model development. This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees. We are also open to exceptional remote candidates who can operate independently and collaborate closely with the team. IN THIS ROLE, YOU WILL: - Design and implement alignment experiments focused on intent following, honesty, calibration, and robustness. - Train and evaluate models using reinforcement learning, and other empirical ML methods. - Develop evaluations for failure modes such as hallucination, instruction-following failures, reward hacking, covert actions, and scheming. - Study methods that encourage models to verify their behavior and report shortcomings honestly, including confession-style training objectives. - Build monitoring and inference-time interventions that ensure compliant behavior or surface model issues to users or downstream systems. - Investigate how alignment methods scale with model capability, compute, data, context length, action length, and adversarial pressure. - Integrate successful techniques into model training and deployment workflows. - Produce externally publishable research when results advance the broader science of alignment. - Collaborate with researchers and engineers across post-training, RL, evaluations, safety, and product-facing teams. YOU MIGHT THRIVE IN THIS ROLE IF YOU: - Have strong hands-on experience training, evaluating, or debugging large ML models, especially LLMs. - Have excellent engineering skills in Python and modern ML frameworks such as PyTorch. - Bring mathematical rigor, quantitative taste, and comfort turning ambiguous research questions into measurable experiments. - Have experience with reinforcement learning, post-training, preference optimization, scalable oversight, model evaluation, or adjacent empirical ML research. - Can operate with high independence and do not need close day-to-day handholding. - Enjoy fast-paced, collaborative research environments where priorities shift as models and evidence change. - Have a strong record in technical problem solving, such as competitive programming, math contests, systems work, or similarly rigorous engineering and research projects. - Care about building AI systems that are trustworthy, honest, and reliable in high-stakes settings. - Are motivated by making concrete
ML Research Engineer, ML Systems
Scale’s ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and operators for fast and automatic training and evaluation of LLM's, as well as evaluation of data quality. Scale is uniquely positioned at the heart of the field of AI as an indispensable provider of training and evaluation data and end-to-end solutions for the ML lifecycle. You will work closely across Scale’s ML teams and researchers to build the foundation platform that supports all our ML research and development. You will be building and optimizing the platform to enable our next generation of LLM training, inference and data curation. If you are excited about shaping the future AI via fundamental innovations, we would love to hear from you! You will: - Build, profile and optimize our training and inference framework - Collaborate with ML teams to accelerate their research and development and enable them to develop the next generation of models and data curation - Research and integrate state-of-the-art technologies to optimize our ML system Ideally you’d have: - Strong excitement about system optimization - Experience with multi-node LLM training and inference - Experience with developing large-scale distributed ML systems - Strong software engineering skills, proficient in frameworks and tools such as CUDA, Pytorch, transformers, flash attention, etc. - Strong written and verbal communication skills and the ability to operate in a cross functional team environment Nice to haves: - Demonstrated expertise in post-training methods &/or next generation use cases for large language models including instruction tuning, RLHF, tool use, reasoning, agents, and multimodal, etc. Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including job-related skills, experience, qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You'll also receive benefits including, but not limited to: comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend. Please reference the job posting's subtitle for where this position will be located. For pay transparency purposes, the base salary range for this full-time position in the locations of San Francisco, New York, Seattle is: $189,600 — $237,000 USD PLEASE NOTE: Our po
Member of Technical Staff (AI Researcher)
Perplexity is seeking top-tier AI Research Scientists and Engineers to advance our AI products and capabilities. We're building the future of AI-powered search and agent experiences through our Sonar models, Deep Research Agent, Comet Agent, and Search products. Join us in creating SOTA experiences that handle hundreds of millions of queries and continue to scale rapidly. Team Structure Depending on your interests and expertise, you'll work on one of three specialized teams: 1. Core Research Team (Horizontal) Focus on generating and improving base models that power all our products. This team works on foundational model capabilities, post-training techniques, building RL infra and infrastructure that benefits the entire organization. 2. Agent Products Team (Vertical) Concentrate on fine-tuning and optimizing models for our Deep Research Agent and Labs/Canvas products. This team bridges research and product, ensuring our agent capabilities deliver exceptional user experiences. 3. Comet Agent Team (Vertical) Dedicated to developing and enhancing our Comet Agent product. This specialized team focuses on the unique requirements and optimizations needed for Comet's specific use cases. Responsibilities Research & Development - Post-train SOTA LLMs using the latest supervised and reinforcement learning techniques (SFT/DPO/GRPO) - Leverage our rich query/answer dataset to scale model performance across Sonar, Deep Research, Comet, and Search products - Stay current with the latest LLM research, especially in model training, optimization, and personalization techniques - Implement preference optimization and personalization capabilities to enhance user experience - Invent in-house improvements and optimizations to enhance SOTA models - Turn research ideas into algorithms and run experiments to launch new models Infrastructure & Implementation - Own full-stack data, training, and evaluation pipelines required for model development - Build robust and effective training frameworks (on top of Megatron/PyTorch) for post-training LLMs - Implement necessary infrastructure and components to support cutting-edge model training at scale - Integrate models seamlessly into our product ecosystem Collaboration - Work closely with engineering teams to integrate models into Perplexity's product suite - Collaborate across teams to ensure cohesive AI experiences throughout our platform - Partner with product teams to understand user needs and translate them into model improvements Qualifications Required - Proven experience with large-scale LLMs and Deep Learning systems - Strong programming skills in Python/PyTorch; versatility is a plus - Experience with post-training techniques and reinforcement learning - Self-starter with a willingness to take ownership of tasks - Passion for tackling challenging problems - Minimum 2-6 years of experience on relevant projects (depending on seniority level) Nice-to-have - PhD in Machine Learning, AI, Systems, or related areas - Experience in post-training LLMs with SFT/DPO/GRPO - C++/CUDA programming skills - Experience building LLM training frameworks - Academic publications and research impact - Experience with agent systems and multi-step reasoning - Background in personalization and preference learning
Staff Software Engineer, Enterprise GenAI
Scale GP (Scale Generative AI Platform) is an enterprise-grade Generative AI platform that provides APIs for knowledge retrieval, inference, evaluation, and more. We are looking for a strong engineer to join our team and help us build and scale our product in a fast-paced environment. The ideal candidate will have a strong understanding of software engineering principles and practices, as well as experience with large-scale distributed systems. You will be responsible for owning large new areas within our product, working across backend, frontend, and interacting with LLMs and ML models. You will solve hard engineering problems in scalability and reliability. You will: - Own large new areas within our product - Work across backend, frontend, and interacting with LLMs and ML models - Deliver experiments at a high velocity and level of quality to engage our customers - Work across the entire product lifecycle from conceptualization through production - Be able, and willing, to multi-task and learn new technologies quickly Ideally you'd have: - 7+ years of full-time engineering experience, post-graduation - Experience scaling products at hyper growth startups - Experience tinkering with or productizing LLMs, vector databases, and the other latest AI technologies - Proficient in Python or Javascript/Typescript, and SQL - Experience with Kubernetes - Experience with major cloud providers (AWS, Azure, GCP) Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including job-related skills, experience, qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You'll also receive benefits including, but not limited to: comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend. Please reference the job posting's subtitle for where this position will be located. For pay transparency purposes, the base salary range for this full-time position in the locations of San Francisco, New York, Seattle is: $252,000 — $315,000 USD PLEASE NOTE: Our policy requires a 90-day waiting period before reconsidering candidates for the same role. This allows us to ensure a fair and thorough evaluation of all applicants. About Us: At Scale, our mission is to develop reliable AI systems for the world's most important decisions. Our products provide the high-quality data and full-sta
Research Engineer, Applied AI Engineering
About the Team OpenAI is at the forefront of artificial intelligence, driving innovation and shaping the future with cutting-edge research. Our mission is to ensure that AI's benefits reach everyone. We are looking for visionary Research Engineers to join our Applied Group, where you'll transform groundbreaking research into real-world applications that can change industries, enhance human creativity, and solve complex problems. About the Role As a Research Engineer in OpenAI's Applied Group, you will have the opportunity to work with some of the brightest minds in AI. You'll contribute to deploying state-of-the-art models in production environments, helping turn research breakthroughs into tangible solutions. If you're excited about making AI technology accessible and impactful, this role is your chance to make a significant mark. In this role, you will: - Innovate and Deploy: Design and deploy advanced machine learning models that solve real-world problems. Bring OpenAI's research from concept to implementation, creating AI-driven applications with a direct impact. - Collaborate with the Best: Work closely with researchers, software engineers, and product managers to understand complex business challenges and deliver AI-powered solutions. Be part of a dynamic team where ideas flow freely and creativity thrives. - Optimize and Scale: Implement scalable data pipelines, optimize models for performance and accuracy, and ensure they are production-ready. Contribute to projects that require cutting-edge technology and innovative approaches. - Learn and Lead: Stay ahead of the curve by engaging with the latest developments in machine learning and AI. Take part in code reviews, share knowledge, and lead by example to maintain high-quality engineering practices. - Make a Difference: Monitor and maintain deployed models to ensure they continue delivering value. Your work will directly influence how AI benefits individuals, businesses, and society at large. You might thrive in this role if you: - Master's/ PhD degree in Computer Science, Machine Learning, Data Science, or a related field. - Demonstrated experience in deep learning and transformers models - Proficiency in frameworks like PyTorch or Tensorflow - Strong foundation in data structures, algorithms, and software engineering principles. - Experience with search relevance, ads ranking or LLMs is a plus. - Are familiar with methods of training and fine-tuning large language models, such as distillation, supervised fine-tuning, and policy optimization - Excellent problem-solving and analytical skills, with a proactive approach to challenges. - Ability to work collaboratively with cross-functional teams. - Ability to move fast in an environment where things are sometimes loosely defined and may have competing priorities or deadlines - Enjoy owning the problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement https://cdn.openai.com/policies/eeo-policy-statement.pdf. Background checks for applicants will be administered in accordance with applicable law, and q
Machine Learning Research Engineer, Agent Data Foundation - Enterprise GenAI
AI is becoming vitally important in every function of our society. At Scale, our mission is to accelerate the development of AI applications. For 9 years, Scale has been the leading AI data foundry, helping fuel the most exciting advancements in AI, including generative AI, defense applications, and autonomous vehicles. With our recent investment from Meta, we are doubling down on building out state of the art post-training algorithms to reach the performance necessary for complex agents in enterprises around the world. The Enterprise ML Research Lab works on the front lines of this AI revolution. We are working on an arsenal of proprietary research, tools, and resources that serve all of our enterprise clients. As MLRE on the Data Foundation team, you’ll work on cutting edge research to define the data flywheel that makes the whole machine move. This includes research around synthetic environments from task definitions, building agents for trace analysis, and contributing to a cutting edge framework that automatically hill-climbs agent-building from an eval set. This will involve creating best-in-class Agents that achieve state of the art results through a combination of post-training + agent-building algorithms. If you are excited about shaping the future of the modern GenAI movement, we would love to hear from you! You will: - Build synthetic data pipelines to generate enterprise environments to use for RL post-training - Create agents to convert traces from production into actionable insights to use to improve agents - Contribute to our agent building product which can construct other agents using coding agents + proprietary algorithms - Train state of the art models, developed both internally and from the community, to deploy to our enterprise customers. Ideally you’d have: - 3+ years of building with LLMs in a production environment - Clear experiences with constructing high quality data to use to improve an LLM/Agent - Publications in top conferences such as NEURIPS, ICLR, or ICML within the last two years - PhD or Masters in Computer Science or a related field Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position and may be inclusive of several career levels at Scale; it will be determined during the interview process based on work location and additional factors, including job-related skills, experience, qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You'll also receive benefits including, but not limited to: comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend. Please reference the job posting's subtitle for where this position will be located. For pay transparency purposes, the base salary range for this full-time position in the locations of San Francisco, New York, Seattle is: $250,000
Research Engineer, Privacy
About the Team The Privacy Engineering Team at OpenAI is committed to integrating privacy as a foundational element in OpenAI's mission of advancing Artificial General Intelligence (AGI). Our focus is on all OpenAI products and systems handling user data, striving to uphold the highest standards of data privacy and security. We build essential production services, develop novel privacy-preserving techniques, and equip cross-functional engineering and research partners with the necessary tools to ensure responsible data use. Our approach to prioritizing responsible data use is integral to OpenAI's mission of safely introducing AGI that offers widespread benefits. About the Role As a part of the Privacy Engineering Team, you will work on the frontlines of safeguarding user data while ensuring the usability and efficiency of our AI systems. You will help us understand and implement the latest research in privacy-enhancing technologies such as differential privacy, federated learning, and data memorization. Moreover, you will focus on investigating the interaction between privacy and machine learning, developing innovative techniques to improve data anonymization, and preventing model inversion and membership inference attacks. This position is located in San Francisco. Relocation assistance is available. In this role, you will: - Design and prototype privacy-preserving machine-learning algorithms (e.g., differential privacy, secure aggregation, federated learning) that can be deployed at OpenAI scale. - Measure and strengthen model robustness against privacy attacks such as membership inference, model inversion, and data memorization leaks—balancing utility with provable guarantees. - Develop internal libraries, evaluation suites, and documentation that make cutting-edge privacy techniques accessible to engineering and research teams. - Lead deep-dive investigations into the privacy–performance trade-offs of large models, publishing insights that inform model-training and product-safety decisions. - Define and codify privacy standards, threat models, and audit procedures that guide the entire ML lifecycle—from dataset curation to post-deployment monitoring. - Collaborate across Security, Policy, Product, and Legal to translate evolving regulatory requirements into practical technical safeguards and tooling. You might thrive in this role if you: - Have hands-on research or production experience with PETs. - Are fluent in modern deep-learning stacks (PyTorch/JAX) and comfortable turning cutting-edge papers into reliable, well-tested code. - Enjoy stress-testing models—probing them for private data leakage—and can explain complex attack vectors to non-experts with clarity. - Have a track record of publishing (or implementing) novel privacy or security work and relish bridging the gap between academia and real-world systems. - Thrive in fast-moving, cross-disciplinary environments where you alternate between open-ended research and shipping production features under tight deadlines. - Communicate crisply, document rigorously, and care deeply about building AI systems that respect user privacy while pushing the frontiers of capability. About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please
Machine Learning Engineer, Integrity
About the Team The Integrity team at OpenAI is dedicated to ensuring that our cutting-edge technology is not only revolutionary, but also secure from a myriad of adversarial threats. We strive to maintain the integrity of our platforms as they scale. The Integrity team is at the front lines of defending against misuse in all its forms: content abuse, scaled attacks, and other actions that could undermine the user experience or harm our operational stability. About the Role As a Machine Learning Engineer in OpenAI's Integrity team, you will have the opportunity to work with some of the brightest minds in AI. You’ll work on state-of-the-art models and classifiers, experiment with new architecture and approaches, and push forward our abilities in content and user understanding. You’ll help turn research breakthroughs into tangible solutions that improve the trust and safety of our platform. If you're excited about training LLMs and building ML models, this role is your chance to make a significant mark. In this role, you will: - Innovate and Deploy: Design and deploy advanced machine learning models that solve real-world problems. Bring OpenAI's research from concept to implementation, creating AI-driven applications with a direct impact. - Collaborate with the Best: Work closely with researchers, software engineers, and product managers to understand complex business challenges and deliver AI-powered solutions. Be part of a dynamic team where ideas flow freely and creativity thrives. - Optimize and Scale: Implement scalable data pipelines, optimize models for performance and accuracy, and ensure they are production-ready. Contribute to projects that require cutting-edge technology and innovative approaches. - Learn and Lead: Stay ahead of the curve by engaging with the latest developments in machine learning and AI. Take part in code reviews, share knowledge, and lead by example to maintain high-quality engineering practices. - Make a Difference: Monitor and maintain deployed models to ensure they continue delivering value. Your work will directly influence how AI benefits individuals, businesses, and society at large. You might thrive in this role if you: - Master's/ PhD degree in Computer Science, Machine Learning, Data Science, or a related field. - Demonstrated experience in deep learning and transformers models - Experience with content understanding or abuse prevention with LLMs is a plus - Proficiency in frameworks like PyTorch or Tensorflow - Strong foundation in data structures, algorithms, and software engineering principles. - Are familiar with methods of training and fine-tuning large language models, such as distillation, supervised fine-tuning, and policy optimization - Excellent problem-solving and analytical skills, with a proactive approach to challenges. - Ability to work collaboratively with cross-functional teams. - Ability to move fast in an environment where things are sometimes loosely defined and may have competing priorities or deadlines - Enjoy owning the problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI’s Affirmative Action and Equal Employm
Engineering Manager (AI Research & Model Training)
Perplexity is seeking a Research Engineering Manager to lead the team of all-star AI researchers and engineers responsible for developing the models that drive our products. Our team has developed some of the most advanced models for agentic research, query understanding, and other domains that require accuracy and depth. As we expand our userbase and portfolio of product surfaces, our in-house models are increasingly critical to providing a premium, high-taste experience for the world’s most sophisticated users. You will dive into our rich datasets of conversational and agentic queries, leveraging cutting‑edge training techniques to scale AI model performance. Through hands-on technical and organizational leadership, you will empower your team to develop SotA models for the use cases that matter most to our business and our users. RESPONSIBILITIES - Lead a team of researchers and engineers focused on training SotA models for Perplexity-relevant use cases, leveraging the latest supervised and reinforcement learning techniques. - Drive research and engineering efforts to develop production models through advanced model training and alignment techniques, including RL, SFT, and other approaches. - Become deeply familiar with the team’s technical stack, leading from the front through hands-on technical contributions. - Own the data, training, and eval pipelines required to train and continuously improve LLM models. - Design and iterate on model training and finetuning algorithms (e.g., preference‑based methods, reinforcement learning from human or AI feedback) through an approach that balances scientific rigor and iteration velocity. - Design evaluations and improve the production model training pipeline to reliably deliver models that lie on the Pareto frontier of speed and quality. - Work closely with engineering teams to integrate in-house models into our product and rapidly iterate based on real‑world usage. - Manage day‑to‑day execution, project planning, and prioritization for the model training team to hit ambitious quality and performance goals. QUALIFICATIONS - Proven experience with large-scale LLMs and Deep Learning systems. - Strong Python and PyTorch skills; versatility across languages and frameworks is a plus. - Experience leading or managing research or engineering teams working on large-scale AI model development, including driving complex projects from idea to production. - Self‑starter with a willingness to take ownership of tasks and navigate ambiguity in a fast‑moving environment. - Passion for tackling challenging problems in AI model quality, speed, safety, and reliability. - 10+ years of technical experience, with at least 2 of those years as a manager and at least 4 of those years working on large-scale AI model development. NICE-TO-HAVE - PhD in Machine Learning or related areas. - Experience training very large Transformer-based models with techniques such as SFT, DPO, GRPO, RLHF‑style methods, or related preference‑based optimization approaches. - Prior experience designing evaluations and production training pipelines for large‑scale models in a high‑growth environment.
Senior Forward Deployed Data Scientist/Engineer
At Scale AI, we help leading enterprises turn AI from a promising capability into reliable systems that improve real workflows and deliver measurable business value. We are hiring a Senior Forward Deployed Data Scientist / Engineer to work directly with customers on ambiguous, high-impact problems at the intersection of data science, product development, and AI deployment. This is not a traditional analytics role. On this team, data scientists do the core statistical and modeling work, but they also build real tools and products: evaluation explorers, operator workflows, decision-support systems, experimentation surfaces, and customer-specific AI/data applications that get used in production. In many cases, the data scientist builds the first usable version of the solution, proves value quickly, and helps drive it into a durable product or platform capability. The right candidate is strong in first-principles problem solving, rigorous measurement, and technical execution. They know how to define metrics, design experiments, diagnose failures, and build systems that people actually use. They are also comfortable using modern AI-assisted development tools to prototype and iterate quickly without sacrificing reliability, observability, or judgment. Python and SQL matter in this role, but as execution fluency in service of building better products and making better decisions. What you’ll do - Partner directly with enterprise customers to understand workflows, operational pain points, constraints, and success criteria - Turn ambiguous business and product problems into measurable solutions with clear metrics, technical designs, and deployment plans - Design and build internal and customer-facing data products, including evaluation tools, workflow applications, decision-support systems, and thin product layers on top of data/ML systems - Build end-to-end solutions across data ingestion, transformation, experimentation, statistical modeling, deployment, monitoring, and iteration - Design evaluation frameworks, benchmarks, and feedback loops for ML/LLM systems, human-in-the-loop workflows, and model-assisted operations - Apply rigorous statistical thinking to experimentation, causal inference, metric design, forecasting, segmentation, diagnostics, and performance measurement - Use AI-assisted development workflows to accelerate prototyping and product iteration, while maintaining strong engineering discipline - Diagnose failure modes across data quality, model behavior, retrieval, workflow design, and user experience, and drive fixes into production - Act as the voice of the customer to Product, Engineering, and Data Science, using field learnings to shape roadmap and platform capabilities What we’re looking for - 5+ years of experience in data science, machine learning, quantitative engineering, or another highly analytical technical role - Proven track record of shipping data, ML, or AI systems that delivered measurable business or product impact - Exceptional ability to structure ambiguous problems, define the right success metrics, and translate them into executable technical plans - Strong foundation in statistics, experimentation, causal reasoning, and measurement - Experience building tools or products, not just analyses — for example internal workflow tools, evaluation systems, operator-facing products, experimentation platforms, or customer-specific applications - Hands-on fluency in Python, SQL, and modern data/AI tooling; able to inspect d
AI Strategy Consultant, Frontier Tech
As a member of our Frontier Tech Consultant team, you will play a critical role in advancing cutting-edge AI innovations by conducting high-impact experiments and ensuring seamless execution at the highest quality standards. Your work will directly contribute to Scale AI’s growth, shaping the future of artificial intelligence. In this role, you will be working on various types of projects, including but not limited to: research experiments, dataset generation, data quality improvements, and in-depth technical analysis. You will tackle complex, technical and operational challenges while collaborating closely with Scale’s ML research scientists and SPM team. The ideal candidate is analytical, detail-oriented, and results-driven, with strong problem-solving abilities and excellent communication skills. We are looking for someone who thrives in a fast-paced environment, is proactive in overcoming challenges, and is committed to delivering exceptional outcomes. If you are eager to contribute to the forefront of AI innovation, we encourage you to apply. You will be responsible for: - Design and execute research experiments - Build and evaluate frontier LLM datasets - Develop training and testing material for frontier pipelines - Improve quality of existing and new products Ideally you’d have: - Strong machine learning knowledge, either by being in the final years of a ML PhD career or having already graduated - Strong writing and verbal communication skills - An action-oriented mindset that balances creative problem solving with the scrappiness to ultimately deliver results - Analytical, planning, and process improvement capability - Experience working in a fast-paced, entrepreneurial environment - Technical skills including familiarity with Python, GPU, AWS, API, LLM, ML, and SQL Pay: $60-80/hr Commitment: This is a fully remote, US-based part-time (10-20 hours per week), on-going contract position staffed via HireArt. HireArt values diversity and is an Equal Opportunity Employer. We are interested in every qualified candidate who is eligible to work in the United States. Unfortunately, we are not able to sponsor visas, including CPT/OPT or employ corp-to-corp . #LI-Onsite PLEASE NOTE: Our policy requires a 90-day waiting period before reconsidering candidates for the same role. This allows us to ensure a fair and thorough evaluation of all applicants. About Us: At Scale, our mission is to develop reliable AI systems for the world's most important decisions. Our products provide the high-quality data and full-stack technologies that power the world's leading models, and help enterprises and governments build, deploy, and oversee AI applications that deliver real impact. We work closely with industry leaders like Meta, Cisco, DLA Piper, Mayo Clinic, Time Inc., the Government of Qatar, and U.S. government agencies including the Army and Air Force. We are expanding our team to accelerate the development of AI applications. <p&
Staff Research Engineer, Discovery Team
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Team Our team is organized around the north star goal of building an AI scientist – a system capable of solving the long term reasoning challenges and basic capabilities necessary to push the scientific frontier. Our team likes to think across the whole model stack. Currently the team is focused on improving models' abilities to use computers – as a laboratory for long horizon tasks and a key blocker to many scientific workflows. About the role As a Research Engineer on our team you will work end to end, identifying and addressing key blockers on the path to scientific AGI. Strong candidates should have familiarity with language model training, evaluation, and inference, be comfortable triaging research ideas and diagnosing problems and enjoy working collaboratively. Familiarity with performance optimization, distributed systems, vm/sandboxing/container deployment, and large scale data pipelines is highly encouraged. Join us in our mission to develop advanced AI systems that are both powerful and beneficial for humanity. Responsibilities: - Working across the full stack to identify and remove bottlenecks preventing progress toward scientific AGI - Develop approaches to address long-horizon task completion and complex reasoning challenges essential for scientific discovery - Scaling research ideas from prototype to production - Create benchmarks and evaluation frameworks to measure model capabilities in scientific workflows and computer use - Implement distributed training systems and performance optimizations to support large-scale model development You may be a good fit if you: - Have 8+ years of ML research experience - Are familiar with large scale language model training, evaluation, and inference pipelines - Enjoy obsessively iterating on immediate blockers towards longterm goals - Thrive working collaboratively to solve problems - Have expertise in performance optimization and distributed computing systems - Show strong problem-solving skills and ability to identify technical bottlenecks in complex systems - Can translate research concepts into scalable engineering solutions - Have a track record of shipping ML systems that tackle challenging multi-step reasoning problems Strong candidates may also have: - Expertise with performance optimization for language model inference and training - Experience with computer use automation and agentic AI systems - A history working on reinforcement learning approaches for complex task completion - Knowledge of containerization technologies (Docker, Kubernetes) and cloud deployment at scale - Demonstrated ability to work across multiple domains (language modeling, systems engineering, scientific computing) - Have experience with VM/sandboxing/container deployment and large-scale data processing - Expe
GenAI Strategic Projects Lead, Public Sector
Scale is at the frontier of the AI industry, improving the world’s leading generative AI and large language models through model evaluations, human-powered supervised fine-tuning datasets, world-class reinforcement learning with human feedback, and more. Scale AI’s Public Sector team is growing in the Generative AI space, and we’re seeking an Strategic Projects Lead to own high-impact projects that drive revenue and experimentation. In this role, you’ll work across operations, engineering, and customer engagement to produce world-class training and test and evaluation data for Large Language Models for our Public Sector customers. This role offers a rare opportunity to make a meaningful impact at the intersection of AI and national security. You will help build Generative AI data-labeling pipelines from the ground up, create operational processes to manage and optimize an in-house expert data workforce, and develop novel technology-driven approaches (e.g., scripts, prompt engineering, hybrid data) to improve the quality of our training and evaluation datasets. In addition, you will partner directly with our internal machine learning experts and external stakeholders to ensure our data enables the development of mission-critical applications of AI. You will: - Develop, build, and maintain the infrastructure required to ensure data pipelines are efficient, scalable, and produce high-quality outputs - Take ownership of day-to-day progress on high-priority data production pipelines, ensuring projects move forward efficiently - Partner with subject matter experts in their fields to validate the quality of our data and to translate deep domain knowledge into scalable processes and measurable outcomes - Work closely with customers to understand their requirements and design data taxonomies that optimize model performance. - Utilize analytics and data visualization tools to track progress, identify bottlenecks, and make data-driven decisions to optimize pipeline performance - Influence cross-org collaboration to define and advance human data strategy, influencing technical and non-technical stakeholders to ensure data quality, scalability, and long-term platform leverage - Own larger and larger components of our data delivery processes, until you ultimately serve as the full owner of our most visible and high impact customer pipelines You have: - 5+ years of experience in product development, data science, or operations - A history of successful project management and comfort in ambiguity - Ability to analyze complex operational data, build queries, and identify trends to inform decisions and optimize processes - Technical aptitude to understand how to produce data for state of the art post-training techniques such as supervised fine tuning (SFT), reinforcement learning through human feedback (RLHF), Reinforcement Learning with Verifiable Rewards (RLVR) etc Nice to have: - Experience working in defense tech and/or an AI company - A technical degree in fields like computer science, data science, or engineering - A deep understanding of ML operations for generative AI workflows / products - An active Top Secret security clearance <div class="content-pay-transparenc
Research Engineer, Safeguards Labs
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Team Safeguards Labs is a new team operating at the intersection of research and engineering, chartered to investigate novel safety methods that protect Claude and the people who use it. We prototype new approaches to safe models, usage safeguards, and production safety — pressure-testing ideas through offline analysis and subsets of traffic before they graduate into production systems run by our partner Safeguards teams. Our work overlaps closely with account abuse, model behavior safeguards, and other safeguard subteams, and we serve as a research arm that can take on ambitious, ambiguous problems and turn them into deployed defenses. About the Role We're hiring research engineers to define and execute the Labs research agenda. You'll scope your own projects, run experiments end-to-end, and decide when an idea is ready to hand off to a production team — or when to kill it and move on. The team is small and being built deliberately around a roughly 3:1 mix of researchers to software engineers, so each person has substantial latitude over what they work on and high leverage on the team's direction. Responsibilities: - Lead and contribute to research projects investigating new methods for detecting misuse of Claude, identifying malicious organizations and accounts, strengthening model safeguards, and other safety needs. - Design and run offline analyses over model usage data to surface abuse patterns, build classifiers and detection systems, and evaluate their effectiveness. - Develop and iterate on prototypes that could eventually feed signals into the real-time safeguards path, partnering with engineers on tech transfer. - Contribute to a broader research portfolio investigating methods for detecting abusive behavior in chat-based or agentive workflows, and for training the model to robustly refrain from dangerous responses or behaviors without over-refusing. - Build evaluations and methodologies for measuring whether safeguards actually work, including in agentic settings. - Write up findings clearly so they inform decisions across Trust & Safety, research, and product teams. You may be a good fit if you: - Have a track record of independently driving research projects from ambiguous problem statements to concrete results, ideally in AI, ML, security, integrity, or a related technical field. - Are comfortable scoping your own work and switching between research, engineering, and analysis as a project demands. - Have working familiarity with how large language models operate — sampling, prompting, training — even if LLMs aren't your primary background. - Are proficient in Python and comfortable working with large datasets. - Care about the societal impacts of AI and want your work to directly reduce real-world harm. Strong candidates may also have: - Experience building and training machine learning models, including classifiers for abuse, fraud, integrity, or security applications. - Knowledge