Principal Engineer, Systems Software
Company: NVIDIA Corporation
Location: Santa Clara
Posted on: October 23, 2024
Job Description:
Principal Engineer, Systems SoftwareWe are looking for a
Principal Software Engineer with experience in building highly
scalable and reliable software to join us. We are building a
powerful operational automation platform for GPU clusters to
improve their performance and utilization while reducing
operational toil.What you'll be doing:
- Architecting the product to discover cluster resources such as
hosts, GPUs, and switches, and automate debug and repair actions on
these resources
- Designing the platform to support GPU clusters across different
CSPs and platforms such as Kubernetes and Slurm
- Developing a distributed workflow execution runtime for
parallel and fault tolerant actions on large number of
resources
- Operating critical software services with high availability and
reliability for customers
- Influencing the product roadmap in collaboration with teams
across various departments with the goal of reducing SRE toil and
improving hardware utilization
- Optimizing performance of system to increase scalability and
improve user experience
- Leading and delivering high impact projects with high quality,
performance and stability with the lowest resource consumption
- Elevating the productivity and creativity of the technical
staff by optimizing engineering practices, guiding junior engineers
and providing quality design and code reviews
- Programming in systems languages like Go and RustWhat we need
to see:
- Bachelor's or Master's degree in Computer Science, Engineering,
or a related field (or equivalent experience)
- 15 years of equivalent experience
- Demonstrated ability in building scalable and robust
distributed systems
- Proven record of product rollouts and collaborating with early
adopters
- Proficiency in programming in Go, Rust, C/C++, or Java
- Technical stewardship of projects across the organizationWays
to stand out from the crowd:
- Deep understanding of concurrency and distributed systems
concepts
- Experience with handling large complex systems
- Experience with SRE, DevOps, and platformsNVIDIA is leading the
way in groundbreaking developments in Artificial Intelligence,
High-Performance Computing and Visualization. The GPU, our
invention, serves as the visual cortex of modern computers and is
at the heart of our products and services. Our work opens up new
universes to explore, enables amazing creativity and discovery, and
powers what were once science fiction inventions from artificial
intelligence to autonomous cars. NVIDIA is looking for great people
like you to help us accelerate the next wave of artificial
intelligence.The base salary range is 272,000 USD - 419,750 USD.
Your base salary will be determined based on your location,
experience, and the pay of employees in similar positions. You will
also be eligible for equity and benefits.NVIDIA accepts
applications on an ongoing basis.NVIDIA is committed to fostering a
diverse work environment and proud to be an equal opportunity
employer. As we highly value diversity in our current and future
employees, we do not discriminate (including in our hiring and
promotion practices) on the basis of race, religion, color,
national origin, gender, gender expression, sexual orientation,
age, marital status, veteran status, disability status or any other
characteristic protected by law.
#J-18808-Ljbffr
Keywords: NVIDIA Corporation, Manteca , Principal Engineer, Systems Software, IT / Software / Systems , Santa Clara, California
Didn't find what you're looking for? Search again!
Loading more jobs...