Site Reliability Operations Engineer

Site Reliability Operations Engineer

  • Location SUNNYVALE, CA
  • Career Area Technology & Software Development
  • Job Function Software Development and Engineering
  • Employment Type -
  • Position Type -
  • Requisition 917967BR

What you'll do at

The SRC Site Reliability Operations Engineer is responsible for pro-actively monitoring, detecting and resolving site issues before they become customer and availability impacting. Technically you will understand the full end to end stack and use this knowledge to detect error/failures and take corrective action to mitigate. During a major incident, you will draw on your technical skills and knowledge to triage, differentiating between symptom and cause, to help restore impacting issues. Your ability to continuously challenge yourself and develop a strong network within your peer group will see you exceed in this role. Our goal is to protect the customer experience and deliver outstanding levels of availability.

Minimum Qualifications

- 3+ years in an infrastructure, systems, engineering or development environment delivering operational excellence to highly complex distributed systems.
- Bachelor's Degree in Computer Science or a related field, or relevant work experience.
- Strong and demonstrable incident management skills with relevant experience in an enterprise organization.
- Experience and exposure working is a 24/7 operations support environment.
- Methodical and systematic problem solving approach, combined with a solid awareness of ownership, initiative and drive.
- Experience investigating, analyzing and troubleshooting large scale enterprise systems.
- Networking knowledge and understanding of network concepts, such as different protocols (TCP/IP, UDP, ICMP, etc.), MAC addresses, IP packets, DNS, OSI layers, and load balancing).
- Programming experience in one or more of the following languages: Go, Java, Python, Ruby, Shell.
- Experience administering Unix/Linux in a production environment.
- Understanding of Unix/Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-server protocols along the way.
- Experience working with and developing enterprise monitoring/tooling solutions like Grafana, Kibana, Splunk, Graphite, Nagios, New Relic, Greylog and HPOM.
- Working knowledge of one or more cloud technologies such as AWS, AZURE OpenStack.

Preferred Qualifications

- Actively provide data for and participate in root cause analysis.
- Adhere to SRC onboarding process when accepting new systems into service.
- Share knowledge globally between SRC teams.
- Analyze systems and make recommendations to prevent possible incidents.
- Strive for continuous improvement and make recommendations based on SRC process.
- Other duties and responsibilities as assigned.

About Walmart

At Walmart, we help people save money so they can live better. This mission serves as the foundation for every decision we make, from responsible sourcing to sustainability—and everything in between. As a Walmart associate, you will play an integral role in shaping the future of retail, tech, merchandising, finance and hundreds of other industries—all while affecting the lives of millions of customers all over the world. Here, your work makes an impact every day. What are you waiting for?
Walmart Associate
"I feel like my manager wants to help me become a better developer and a better person overall."
— Roel, Program Analyst

Hello, Silicon Valley

You don’t have to choose between your career and your lifestyle in Silicon Valley. Here, you can have both.

Discover Silicon Valley
Silicon Valley
View of Silicon Valley from the hills after a passing storm

All the benefits you need for you and your family

  • Multiple health plan options
  • Vision & dental plans for you & dependents
  • Associate discounts in-store and online
  • Financial benefits including 401(k), stock purchase plans and more
  • Education assistance for Associate and dependents

Recently viewed jobs