(USA) Director, Software Engineering

(USA) Director, Software Engineering

  • Location SUNNYVALE, CA
  • Career Area Software Development and Engineering
  • Job Function Software Development and Engineering
  • Employment Type Regular/Permanent
  • Position Type Salary
  • Requisition WD806332

What you'll do at

Position Summary...

What you'll do...

As a member of the SRE team, you will work with other DevOps and Engineering practitioners to support mission-critical infrastructure, develop, and build tools, and processes that will ensure highest levels of availability and reliability of all Walmart eCommerce systems, and services.  


The mission of our Customer Reliability Center is to operate an always-on, self-healing, fault resilient, customer-centered, proactive systems that deliver Walmart customer experience, across multiple internet-facing eCommerce applications, databases, platforms, and technology stacks.  


As Director of SRE Customer Reliability Center (CRC), you will be managing and directing high performing teams and talented engineers responsible for the Production Support of customer-facing eCommerce applications, infrastructure, and digital platforms to include Walmart.com. In addition to the Production Support for Public Cloud and On-Premises hosted systems and applications, you will be expected to work with management, peers, and customers to define and implement the technical vision of the team and to adopt SRE practices. 


The Director of SRE Customer Reliability Center (CRC), is a key member of the Leadership team. Primary responsibilities include the development of the overall strategy, analysis, design, development, and purposeful execution of SRE-based best-practices across the eCommerce ecosystem. Also, this role will challenge the status quo of traditional operations and consistently seek opportunities to re-engineer processes and re-structure team roles & responsibilities to align with the strategic Site Reliability Engineering (SRE) model. This will be accomplished through passionate situational leadership, inspiration, mentoring and exceptional management of a large team of technical professionals executing key initiatives through the lifecycle. 


You are right for the job if: 

  • You’re a servant leader 

  • You have built and led high performance teams 

  • You love to solve complex distributed systems problems 

  • You have high standards and can hold your team and partners accountable 

  • You’re stimulated by challenges and are ready to engage at Fortune 1 scale 

  • You have enthusiasm for digging deep, and a flare for sharp technical communication, prioritization, and organization. 

  • You have a passion to build our next generation “always on” and “highly available” cloud-based e-commerce/Retail and Enterprise platform. 

Site Reliability Engineers are hybrid systems and software engineers who are responsible and take ownership for reliability, scalability, automation, and other issues related to uptime and availability of Walmart’s e-commerce/Retail and Enterprise platform. Our goal is to build, scale and guard the systems that delights the customers. To do so, you will need to strong skills in following areas: 

  • Improve the reliability, latency, availability, and scalability of Walmart e-commerce/Retail and Enterprise products. 

  • Engender reliability and availability starting with metrics and measurements. 

  • Enable scaling by providing tools, developing training and/or augmenting processes. 

  • Build tools/automate to prevent re-occurrence of problem to mission critical products/services. 

  • Augment existing instrumentation to build a cohesive picture of the characteristics of our systems with special attention to points of failure. 

  • Participate in capacity planning, demand forecasting, software performance analysis and system tuning. 

  • Develop a deep understanding of the numerous services and applications that come together to deliver Walmart e-commerce/Retail and Enterprise products. 

  • Design new tools to monitor and smart alerts that help discover failures/issues in a timely fashion and work with engineers to identify root cause and fix issues. 

  • Influence, design and create new architectures, standards, and methods for large-scale enterprise systems. 

  • Root-cause analysis complex problems involving multiple parties, networks, hardware, and software that relate to scaling and performance. 

  • High focus on collecting and inferring metric documentation to be used by others to build and maintain systems. 

  • Build and drive the automation systems that maintain system health 

  • Eliminate Single Point of failure and test disaster recovery and HA regularly. 

  • Drives standardization and service focused instrumentation.  Leads the resolution of break/fix scenarios, engaging broader teams as necessary; and partners/leads to achieve continuous improvement. Contributes to command-and-control related activities focused on restoration of complex outages, and rapid restoration. Provides mentoring and guidance to more junior team members.  

  • Utilizes professional best practices in day-to-day work like revision control, unit testing, or other. Applies statistical data analysis techniques. 

  • Networking responsibilities: Understanding and performing TCP dumps, snoop, and other network sniffers. Understands and applies knowledge of most protocols (TCP/IP, HTTP, UDP, etc.)  

  • Application Technologies: Provides recommendations and advice to the team and/or department in the areas of web services, OS, and storage, including being an active liaison to Development, QA, and the Business.  

  • Analyzes systems and makes recommendations to prevent potential problems. Takes lead on issue resolution activities using knowledge of complex and company-wide systems. 

  • Lead end-to-end audit of monitors and alarms based on subsystem knowledge.  

  • Lead the resolution of issues in a timely and organized manner, effectively communicating necessary information.  

  • Consistent exercise of independent judgment and discretion in matters of significance. 

  • Other duties and responsibilities as assigned. 


What you'll bring...

  • 10+ years in a SRE, DevOps role, or Software Engineering role. 

  • 10+ years in an Operations organization supporting 24/7 mission-critical applications and services  

  • Experience in designing, investigating, analyzing, and troubleshooting large-scale enterprise systems. 

  • Methodical and systematic problem-solving approach, combined with a solid awareness of ownership, initiative, and drive. 

  • Fluency with running services at scale; In depth understanding of Unix systems internals and networking. 

  • Networking knowledge and in depth understanding of network concepts, such as different protocols (TCP/IP, UDP, ICMP, etc.), MAC addresses, IP packets, DNS, OSI layers, and load balancing). 

  • Understanding of Unix/Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-server protocols along the way.  

  • Programming experience in one or more of the following languages: Go, Kotlin, Java, Python, Ruby, Shell 

  • Bachelor's Degree in Computer Science or a related field, or relevant work experience 

  • Experience with distributed version control like Git or similar 

  • Experience with IaaS and PaaS providers such as AWS, AZURE, OpenStack, GCP 

  • Experience with containerization and container platforms. (e.g., Docker, Kubernetes, Docker EE, OpenShift, Mesosphere). 

  • Experience with enterprise monitoring solutions like Dynatrace, AppDynamics, New Relic, Prometheus, Graphite, Grafana, Nagios, Sensu and Splunk 

  • Familiarity with continuous integration/deployment processes and tools such as Jenkins, Maven, Nexus, etc., 

Benefits & Perks

Beyond competitive pay, you can receive incentive awards for your performance. Other great perks include 401(k) match, stock purchase plan, paid maternity and parental leave, PTO, multiple health plans, and much more.


Who We Are

What started small, with a single discount store and the simple idea of selling more for less, has grown over the last 50 years into the largest retailer in the world. Each week, over 260 million customers and members visit our 11,695 stores under 59 banners in 28 countries and e-commerce websites in 11 countries. With fiscal year 2017 revenue of $485.9 billion, Walmart employs approximately 2.3 million associates worldwide. Walmart continues to be a leader in sustainability, corporate philanthropy and employment opportunity. It's all part of our unwavering commitment to creating opportunities and bringing value to customers and communities around the world.

About Global Tech 

Imagine working in an environment where one line of code can make life easier for hundreds of millions of people and put a smile on their face. That’s what we do at Walmart Global Tech. We’re a team of 15,000+ software engineers, data scientists and service professionals within Walmart, the world’s largest retailer, delivering innovations that improve how our customers shop and empower our 2.2 million associates. To others, innovation looks like an app, service or some code, but Walmart has always been about people. People are why we innovate, and people power our innovations. Being human-led is our true disruption. 


We’re virtual 

Working virtually this year has helped us make quicker decisions, remove location barriers across our global team, be more flexible in our personal lives and spend less time commuting.  Today, we are reimagining the tech workplace of the future by making a permanent transition to virtual work for most of our team. Of course, being together in person is an important part of our culture and shared success. We’ll collaborate in person at a regular cadence and with purpose. 

Minimum Qualifications...

Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.

As permitted by applicable law, provide evidence of full vaccination as defined by CDC guidelines OR secure approval of medical or religious accommodation for the vaccination mandate., Bachelor’s degree in Computer Science and 6 years’ experience in software engineering or related field OR 8 years’ experience in software
engineering or related field.
3 years’ supervisory experience.

Preferred Qualifications...

Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.

Master’s degree in Computer Science or related field and 5 years' experience in software engineering

Primary Location...

680 WEST CALIFORNIA AVENUE, SUNNYVALE, CA 94086-4834, United States of America

About Walmart

At Walmart, we help people save money so they can live better. This mission serves as the foundation for every decision we make, from responsible sourcing to sustainability—and everything in between. As a Walmart associate, you will play an integral role in shaping the future of retail, tech, merchandising, finance and hundreds of other industries—all while affecting the lives of millions of customers all over the world. Here, your work makes an impact every day. What are you waiting for?

Walmart, Inc. is an Equal Opportunity Employer – By Choice. We believe we are best equipped to help our associates, customers and the communities we serve live better when we really know them. That means understanding, respecting and valuing diversity – unique styles, experiences, identities, ideas and opinions – while being inclusive of all people.

Hello, Silicon Valley

You don’t have to choose between your career and your lifestyle in Silicon Valley. Here, you can have both.

Discover Silicon Valley
Silicon Valley
View of Silicon Valley from the hills after a passing storm

All the benefits you need for you and your family

  • Multiple health plan options, including vision & dental plans for you & dependents
  • Financial benefits including 401(k), stock purchase plans, life insurance and more
  • Associate discounts in-store and online
  • Education assistance for Associate and dependents
  • Parental Leave
  • Pay during military service
  • Paid Time off - to include vacation, sick, parental
  • Short-term and long-term disability for when you can't work because of injury, illness, or childbirth

Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to specific plan or program terms. For information about benefits and eligibility, see One.Walmart.com/Benefits.

Recently viewed jobs