Staff Site Reliability Engineer
- Location BENTONVILLE, AR
- Career Area Information Technology
- Job Function Information Technology
- Employment Type Full Time
- Position Type Salary
- Requisition 1187519BR
What you'll do at
You're right for the job if you're comfortable with deep technical Linux, networking topics, and distributed architectures. You will work cross-functionally amongst a variety of teams and be a core contributor in every significant engineering service or solution that we deliver to our stakeholders. You'll excel if you have enthusiasm for digging deep, and a flare for sharp technical communication, prioritization and organization. You will work directly with our Software Engineering teams to build our next generation “always up” cloud based e-commerce/Retail and Enterprise platform.
Site Reliability Engineers are hybrid systems and software engineers who are responsible and take ownership for reliability, scalability, automation, and other issues related to uptime and availability of Walmart’s e-commerce/Retail and Enterprise platform. Our goal is to build, scale and guard the systems that delights the customers. To do so, you will need to strong skills in following areas:
- Design, write and build tools to improve the reliability, latency, availability and scalability of Walmart e-commerce/Retail and Enterprise products.
o Engender reliability and availability starting with metrics and measurements
o Enable scaling by providing tools, developing training and/or augmenting processes
o Build tools/automate to prevent re-occurrence of problem to mission critical products/services.
- Augment existing instrumentation to build a cohesive picture of the characteristics of our systems with special attention to points of failure.
- Participate in capacity planning, demand forecasting, software performance analysis and system tuning.
- Develop a deep understanding of the various services and applications that come together to deliver Walmart e-commerce/Retail and Enterprise products
- Design new tools to monitor and smart alerts that help discover failures/issues in a timely fashion and work with engineers to identify root cause and fix issues
- Influence, design and create new architectures, standards and methods for large-scale enterprise systems.
- Root-cause analysis complex problems involving multiple parties, networks, hardware and software that relate to scaling and performance
- Participate in on-call rotation.
- Secure the system from issues, be they real, perceived or notional
- High focus on collecting and inferring metrics
- Experience with configuration management tools such as Ansible, Saltstack, Chef and Puppet
- Build and drive the automation systems that maintain system health
- Eliminate Single Point of failure and test disaster recovery and HA regularly.
- Bachelor of Science and 6 years' experience in software engineering OR Master of Science and 3 years' experience in software engineering.
- Creates systems engineering and architectural documentation to be used by others to build and maintain systems.
- Scripting and Development responsibilities: Develop software in several modern languages. Develops large/complex database-backed systems and has an understanding of DB schema and query performance. Utilizes professional best practices in day-to-day work like revision control, unit testing, or other. Applies statistical data analysis techniques.
- Networking responsibilities: Understanding and performing TCP dumps, snoop, and other network sniffers. Understands and applies knowledge of most protocols (TCP/IP, HTTP, UDP, etc.)
- Application Technologies): Provides recommendations and advice to the team and/or department in the areas of web services, OS, and storage, including being an active liaison to Development, QA and the Business.
- Analyzes systems and makes recommendations to prevent possible problems. Takes lead on issue resolution activities using knowledge of complex and company-wide systems.
- Lead end-to-end audit of monitors and alarms based on subsystem knowledge.
- Utilizes time management and project management skills to lead the resolution of issues in a timely and organized manner, effectively communicating necessary information. May consult directly with developers or third party vendors; provides subject matter expertise.
- Consistent exercise of independent judgment and discretion in matters of significance.
- Other duties and responsibilities as assigned.
Hello, NW Arkansas
With over 200 miles of trails, an emerging locally-sourced food scene, the world-renowned Crystal Bridges Museum—NWA has something for everyone.Discover NW Arkansas
Crystal Bridges Museum
Celebrate the American spirit in a setting that unites the beauty of art and the power of nature.
Walton Arts Center
Arkansas' premiere center for visual arts and entertainment.
An interactive children's museum that's fun for the whole family.
42 acres of premiere public garden space.
Devil's Den State Park
Located on 2,500 acres, Devil's Den State Park is the perfect place to explore Arkansas' natural beauty.
- come together
The best of shopping and restaurants, right in the heart of Fayetteville.
All the benefits you need for you and your family
- Multiple health plan options
- Vision & dental plans for you & dependents
- Associate discounts in-store and online
- Financial benefits including 401(k), stock purchase plans and more
- Education assistance for Associate and dependents