DevOps Engineer – American Science Cloud (AmSC) Project
Experience Level: Mid-level to Senior
Work Location: Remote
Project Overview: American Science Cloud – A Platform for Transformative Science
The American Science Cloud (AmSC) is a secure, federated, and science-optimized cloud environment that brings together the Department of Energy’s leading computing systems, experimental facilities, data resources, and high-performance networks.
The platform enables DOE scientists to create, access, and integrate AI-ready datasets, run scalable model training on advanced systems, perform distributed simulations, control scientific instruments, and move data efficiently across multiple sites.
This initiative is a multi-lab, public-private partnership working alongside the Models Consortium (ModCon), which will deploy advanced AI models and services onto the platform.
The Team
As a Cloud/DevOps Engineer, you will join the L2 Infrastructure Services group within AmSC. You will support the multi-cloud central hub infrastructure across development, staging, pre-production, and production environments.
You will collaborate with other L2 science service teams building on top of this infrastructure, including teams focused on data catalogs, large-scale HPC compute services, user interfaces and APIs, and AI/MLOps operations.
Most contributors currently support AmSC on a part-time basis. You will be among the first full-time engineers fully dedicated to the project. Your primary responsibility will be enabling science teams by building foundational infrastructure and developing CI/CD pipelines for service deployment.
Major Duties/Responsibilities
- Administer Kubernetes clusters and support application deployments across environments
- Build and maintain pipelines for cloud infrastructure and science service deployment
- Manage container image registries such as Harbor
- Develop and maintain automation for provisioning and CI/CD using tools like Terraform, GitOps, and Python
- Implement security controls in alignment with DevSecOps practices
- Configure instrumentation for infrastructure and services to support monitoring and alerting
- Provide operational support and engineering for production applications
- Define KPIs, improve processes, and drive continuous optimization
- Troubleshoot and resolve platform issues efficiently
- Participate in on-call rotation, including 24/7 support and scheduled maintenance
- Deploy and manage Kubernetes clusters (EKS, AKS, GKE, or equivalent), including upgrades, node lifecycle, networking, and multi-environment promotion
- Collaborate with vendors to resolve hardware and software issues
- Align work with core values: Impact, Integrity, Teamwork, Safety, and Service
- Foster a culture of diversity, equity, inclusion, and accessibility
Basic Qualifications
- Bachelor’s degree in Computer Science or a related field
- Minimum of 2 years of experience as a DevOps Engineer or Cloud Engineer (or equivalent combination of education and experience)
Preferred Qualifications
- Experience leading or managing DevOps or Cloud Engineering teams
- Strong communication and collaboration skills
- Knowledge of cloud architecture patterns and managed services (preferably AWS or another major provider)
- Experience with Kubernetes administration, CRDs, and deployment strategies such as GitOps and Helm
- Solid understanding of Unix systems and networking protocols
- Strong grasp of cloud networking concepts
- Ability to identify performance issues and recommend improvements
- Experience gathering requirements and implementing solutions
- Strong organizational and time management skills with minimal supervision
- Experience with CI/CD methodologies and tools
- Familiarity with version control platforms like GitHub or GitLab
- Experience with monitoring tools such as Nagios, Grafana, and Prometheus
- Experience with Terraform or OpenTofu in multi-account AWS environments (including AWS Organizations, SCPs, IRSA)
- Hands-on experience with ArgoCD, including App of Apps and ApplicationSets
- Familiarity with Tanka, Jsonnet, or similar configuration-as-code tools
- Experience with API gateways such as Kong in Kubernetes environments
- Knowledge of secrets management solutions like AWS Secrets Manager or External Secrets Operator
- Exposure to research networks such as ESnet or Internet2 is a plus
Special Requirement
This position requires the ability to obtain and maintain a federal public trust clearance. It is classified as a Workplace Substance Abuse Program (WSAP) testing position, requiring a pre-employment drug test and participation in random drug testing. Employees must also report any drug-related arrests, convictions, or positive test results as required by ORNL policies.
Ads do not influence the answers you get from ChatGPT. Your chats stay private. Learn about ads and personalization