Senior SRE / DevOps Engineer (Atlanta) (Alpharetta) Job at Broad Reach Partners, Alpharetta, GA

MW5VbXd1b0UvVjV6WUdPeFo1T3BqMnZZbnc9PQ==
  • Broad Reach Partners
  • Alpharetta, GA

Job Description

We are seeking a Site Reliability Engineer to join our team in Atlanta and play in enhancing the stability, performance, and reliability of our production systems. Youll work closely with development, DevOps, and security teams to improve observability, optimize system performance, and ensure production readiness. From monitoring to automation, youll make a direct impact on our cloud infrastructure and service reliability.

In this role, you will work hand-in-hand with our development, operations, and security teams worldwide to implement best practices, automate deployments, and ensure our platforms are reliable, secure, and scalable. Troubleshooting in Kubernetes is required which will involve you having a deep understanding of pods, nodes, networking, scaling, logs, and service-to-service communication.

This role requires a deep understanding of SRE best practices and a strong ability to troubleshoot complex issues.

Your responsibilities in this role will include:

  • Maintain and enhance monitoring tools (New Relic, Graylog) for service health and performance metrics.
  • Implement and maintain high-availability systems with capacity planning, performance optimization, and fault tolerance.
  • Define and monitor Service Level Indicators, Objectives, and Agreements with teams.
  • Deploy and manage Kubernetes workloads to AWS EKS(A) using Helm, ArgoCD
  • Automate operational processes to reduce manual interventions.
  • Manage Kubernetes workloads on AWS EKS for secure and stable deployments.
  • Participate in on-call rotation, troubleshoot production issues, and implement permanent fixes.
  • Work with DevOps to improve CI/CD pipelines and with development teams to embed resilience and observability.
  • Document operational runbooks, escalation procedures, and production playbooks.

We are looking for you to have the following skills and experience:

  • 8+ years of experience as a Site Reliability Engineer, or equivalent
  • Experience with tools like New Relic for monitoring and Graylog for logging.
  • 3+ years of experience with Amazon Web Services (AWS) or Microsoft Azure
  • 3+ years of experience with Kubernetes clusters - performance monitoring in Kubernetes.
  • Proficiency with public cloud environments (AWS preferred)
  • Proficiency in scripting language, like Bash, Groovy, Python
  • Excellent debugging and troubleshooting skills.
  • Ability to prioritize tasks efficiently and independently under minimal supervision.

Nice to Have

  • AWS Cloud certification
  • Familiar with .NET applications.
  • Knowledge in Terraform, Ansible, monitoring tools

This is a full-time role and unfortunately we can't sponsor so you must be a USC or be a green card holder. You must currently live in the Atlanta area as you will need to come into our Atlanta office one or two times each month for key meetings with our team.

If you thrive on solving complex technical challenges, have a passion for automation, and want to influence how enterprise platforms evolve and modernize, this is an ideal opportunity for you.

Job Tags

Permanent employment, Full time, Part time, Live in, Work at office, Worldwide,

Similar Jobs

Ameriship Parcel Delivery

Delivery Driver/Contractor Job at Ameriship Parcel Delivery

 ...Job Details: Position: Delivery driver Pay: $400 - $1,300 per week (paid per delivery) Job Type:Independent Contractor/Courier Schedule: Monday through Friday (Saturday Availability) Location: Irving, TX 75063 Job Summary: Ameriship Parcel Delivery... 

LSG Sky Chefs

Food Production Logistics Manager - Airline Catering Job at LSG Sky Chefs

 ...Job Title: Food Production Logistics Manager - Airline Catering Job Location: Philadelphia-USA-19153 Work Location Type: On-Site Salary Range: $90,000.00 - 110,000.00 About us LSG Sky Chefs is one of the worlds largest airline catering... 

University of California, Santa Cruz

Principal Construction Inspector Job at University of California, Santa Cruz

 ...services related to physical and environmental planning, design and construction, engineering, physical plant maintenance and operations,...  ...Date: None. Work Location: UC Santa Cruz Main Campus. Union Representation: 99 - Non-Represented. Job Code Classification... 

Piedmont Healthcare Inc.

Registered Nurse- STAT Team, PRN Job at Piedmont Healthcare Inc.

 ...Qualifications: Education ~ Graduate of an accredited school of nursing Required Work Experience ~2 years critical care...  ...Certifications ~ Current License in the State of Georgia as a Registered Nurse or NLC/eNLC Multistate License Required Additional... 

Buckeyenwh

Housekeeping / Laundry Manager (FT) (Day shift) Job at Buckeyenwh

 ...Buckeye Terrace is seeking to hire a Ful Time Housekeeping manager. Position Summary The Director of Housekeeping is responsible...  ...necessary job duties of all positions for which the Director of Laundry Operations is responsible. The ability to provide positive...