Mastercard is a leading global payments & technology company that connects consumers, businesses, merchants, issuers & governments around the world. What You'll Do Drive Platform Infrastructure: Own DevOps and infrastructure for MLOps and agentic AI systems, establishing reusable patterns for CI/CD, scalable inference, orchestration, observability, and cost control. Design secure, scalable, repeatable systems using Infrastructure as Code (IaC) to support R&D workloads. Build secure CI/CD & automation systems: Enable secure tool access, workload isolation, and infrastructure for LLM-backed APIs and MCP servers, while partnering with security and compliance on access control, infrastructure governance and auditability. Ensure Reliability & Observability: Implement monitoring, logging, and alerting. Tune observability for ML-specific workloads to ensure performance, reliability, and operational insight. Provide Technical Leadership: Offer hands-on leadership across DevOps and platform initiatives. Review code, enforce best practices, improve tooling, and promote clean, well-tested infrastructure. Cross-Functional Collaboration: Partner with ML, software, and platform engineers to design deployment strategies, scope work, manage agile deliverables, and meet milestones. What You'll Bring Extensive DevOps Experience: 8–12+ years in DevOps, SRE, or platform engineering, including senior/lead roles. Experience designing end-to-end infrastructure systems, solving scale/performance challenges, and operating platforms in production. Cloud & Infrastructure Expertise: Strong skills in cloud platforms (AWS, Azure, or GCP) and AI/ML components such as Databricks, Azure ML, and MLflow. Deep experience with Infrastructure as Code using Terraform and orchestration tools like Terragrunt. Container & Orchestration Mastery: Expertise in Kubernetes and Docker, including how they optimise ML development workflows. Experience with container security, networking, and cluster management at scale. AI/ML Platform Knowledge: Understanding of ML workflow requirements—model registries, feature stores, AI agents, Retrieval-Augmented Generation (RAG) techniques, and frameworks like LangChain/LlamaIndex. Leadership & Mentorship: Ability to translate ambiguous goals into clear plans, guide engineers, and lead technical execution. Problem-Solving Mindset: Approach issues systematically, using analysis and data to select scalable, maintainable solutions. Required Skills Education & Background: Bachelor's degree in Computer Science, Engineering, or related field. 8–12+ years of proven experience architecting and operating production-grade infrastructure, especially those supporting AI/ML workloads. Infrastructure as Code: Expert in Terraform and IaC orchestration tools like Terragrunt. Strong experience with configuration management and GitOps practices. Programming & Scripting: Advanced Bash and Python skills and strong software engineering fundamentals (version control, CI, code reviews). Familiarity with Go or other systems programming languages is a plus. CI/CD & Automation: Hands-on experience with Jenkins, GitHub Actions, GitLab CI, or similar tools. Strong understanding of pipeline design, artifact management, and deployment strategies. Monitoring & Observability: Experience with monitoring stacks such as Prometheus, Grafana, Splunk, and ELK. Skilled in building dashboards, alerts, and tuning observability for ML-specific use cases. Cloud Infrastructure: Experience deploying systems on AWS/Azure/GCP. Familiar with cloud-native services, serverless computing, and managed Kubernetes offerings (EKS, AKS, GKE). Comfortable with Linux internals and shell scripting. Security & Networking: Knowledge of security best practices for MLOps, including data privacy, compliance, access controls, and encryption. Understanding of modern networking protocols (mTLS) and secure service communication. Collaboration & Agile Delivery: Strong communication skills and experience working with cross-functional teams. Ability to document designs clearly and deliver iteratively using agile practices. Preferred Skills Databricks Experience: Hands-on experience with Databricks, including workspace administration, cluster management, Unity Catalog, Delta Lake, and Lakehouse architectures. Familiarity with Databricks workflows, jobs orchestration, and MLflow integration. Advanced Cloud & ML Platform Expertise: Experience with Azure ML, SageMaker, or similar ML platforms. Familiarity with model serving, feature stores, and ML pipeline orchestration. ML Frameworks Familiarity: Knowledge of ML frameworks like TensorFlow, PyTorch, or Scikit-learn to better support ML engineering teams. Enterprise Security: Experience working in complex enterprise environments with strict security and compliance requirements. Strong networking fundamentals, including configuring and maintaining secure mTLS-based communication between services. DevOps & Platform Innovation: Experience implementing self-service platform automation, developer portals, or internal developer platforms (IDPs). Continuous Learning: Motivation to explore emerging technologies, especially in AI, generative AI, and cloud-native infrastructure. Certifications, personal projects, or open-source contributions are a plus. Corporate Security Responsibility All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must: Abide by Mastercard's security policies and practices; Ensure the confidentiality and integrity of the information being accessed; Report any suspected information security violation or breach, and Complete all periodic mandatory security trainings in accordance with Mastercard's guidelines Corporate Security Responsibility All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must: Abide by Mastercard’s security policies and practices; Ensure the confidentiality and integrity of the information being accessed; Report any suspected information security violation or breach, and Complete all periodic mandatory security trainings in accordance with Mastercard’s guidelines.