We're looking for a hands-on engineer/technician to assist with the setup, maintenance, and operation of our high-performance computing cluster.
This role is ideal for someone with practical experience in Linux systems in the data center who enjoys working in a fast-paced technical environment.
Key Responsibilities
Racking, Stacking, Cabling and maintenance the AI data center and lab.
Perform routine maintenance and troubleshooting on Linux servers, storage and networking systems.
Use tools to monitor and troubleshoot hardware issues.
Work closely with engineers and developers to ensure smooth operation of the AI infrastructure.
Required Skills/Experience
Experience with assembly of mechanical or electrical systems, or performing component-level repairs and troubleshooting on technical equipment.
Ability to lift/move 50lb (23kg) of equipment and ability to exert yourself physically over extended periods of time, including frequent bending, kneeling, climbing, pushing/pulling and lifting.
Experience working within a data center or network operation center environment.
Comfortable working in a Linux environment & ability to diagnose and troubleshoot issues in operating systems, computer/server hardware or networking stack.
Able to write and understand simple Bash or Python scripts.
Exposure to Git, Jenkins, or similar tools is a plus.
?Role Overview
This role is a hands-on, hardware-focused technical support position supporting GPU/compute clusters in an AI lab/R&D environment. The emphasis is on hardware troubleshooting, Linux-based system support, and deep understanding of compute architecture, rather than software development.
Key Responsibilities
Troubleshoot GPU/CPU servers, compute clusters, and networking (InfiniBand)
Diagnose hardware issues (cabling, components, GPUs, servers)
Rack/stack initially limited (systems already built), but may increase if extended
Replace/install server components within racks
Use Linux command line extensively for diagnostics and system validation
Manage lab space and hardware inventory (re-procurement access provided)
Must-Have Skills (Non-Negotiable)
Strong hardware troubleshooting experience (servers, GPUs, compute systems)
Solid understanding of computer/compute architecture
Strong Linux skills for system bring-up and troubleshooting
Experience with GPUs and high-performance compute environments
Ability to independently diagnose and resolve hardware/system issues
Preferred / Nice-to-Have
Prior data center or HPC/compute cluster experience (plus, not mandatory)
Scripting experience (Bash, Python) expected if candidate has done similar roles
Familiarity with GPU technologies (cutting-edge R&D GPUs; Tesla, etc.)
Candidates who've built systems themselves (gaming PCs, lab servers, small data centers)
Experience & Education
Minimum: 3 4 years of relevant experience (not pure sysadmin only)
Bachelor's degree preferred, but experience matters more than degree
Are you looking for more jobs nearby? Find your favorite jobs now by visiting our online jobs page.