Senior DevOps/SRE Engineer (Ellicott City) Job at VITG, Ellicott City, MD

MUhjdHp1d04rVlZ4YjJ5L1pwR3BqV25Sa3c9PQ==
  • VITG
  • Ellicott City, MD

Job Description

Job Description

We are seeking a skilled mid-level Senior DevOps Site Reliability Engineer (SRE) to ensure the reliability, availability, and performance of enterprise services hosted across Cloud Service Providers (CSPs) and on-prem data centers. The SRE is responsible for the practical implementation of Site Reliability Engineering (SRE) principles through best practices, operations, and monitoring. Speed and stability are carefully balanced; and the SRE team acts as versatile problem solvers, filling gaps in knowledge and expertise to ensure efficient software operations.

If you are a proactive problem solver with a passion for continuous learning and innovation, join us as we endeavor to increase the dynamism and efficacy of our DevOps practices.

Applicant Requirements:

  • Must be a US citizen or must be authorized to work in the United States.
  • Must have lived in the USA for three (3) of the last five (5) years.
  • Must be able to obtain a US federal government badge and eligible for Public Trust clearance.
  • Must be able to pass a VITG background check, including a drug test.

Were looking for candidates who:

  • Demonstrate hand-on expertise in SRE principles, with a strong understanding of maintaining quality and stability of enterprise services in a continuous development environment
  • Must possess experience designing and developing solutions using various AWS services
  • Must possess experience in developing scripts in Shell/Bash, Python and deploying them as step/lambda functions
  • Must possess experience working with monitoring and administering observability tools like Splunk, Datadog, and New Relic
  • Possess extensive knowledge in troubleshooting issues while leveraging monitoring tools like Splunk, Datadog, New Relic, AWS services, etc.
  • Possess skill related to analyzing, identifying and documenting root cause analysis.
  • Possess a strong technical background and be able to provide clear explanations of technical concepts verbally and in writing
  • Demonstrate ability and passion to learn new technologies quickly and perform Proof of Concepts (POCs) based on project needs
  • Apply strong problem solving skills in monitoring system performance, troubleshooting issues, crisis management, etc.
  • Produce high quality work independently and collaboratively
  • Excel in a fast-paced environment
  • Demonstrate effective communication and collaboration, and be a team player.

Job Responsibilities:

  • Design and develop monitoring solutions leveraging approved AWS services using Infrastructure as Code (IaC) tools.
  • Develop and maintain CI/CD pipelines using Github, Jenkins.
  • Develop serverless functions and scripts using python, curl, and/or bash.
  • Leverage observability best practices to proactively identify potential software issues and implement preventive measures to minimize potential for system incidents and outages.
  • Set and monitor critical metrics to gain insights into system reliability, including latency, traffic, errors, and saturation levels.
  • Learn and adapt new technologies to perform POCs (Proof of Concepts) based on project needs.
  • Provide guidance, training, and support for external development teams to manage their infrastructure independently.
  • Develop, publish, and maintain all required documentation in the repository and ticketing system (i.e., Confluence and Jira).
  • Respond quickly and effectively to critical incidents, conduct post-incident reviews to identify root causes and implement preventive measures.
  • Collaborate effectively with cross-functional teams and communicate SRE concepts and recommendations clearly to both technical and non-technical stakeholders.
  • Participate in reliability-based release management processes.
  • Plan, participate and manage on-call rotations to ensure prompt response to reported performance and reliability issues.
  • Attend ongoing and ad hoc meetings with internal and external stakeholders.
  • Stay up-to-date with the latest industry trends, technologies, and best practices related to SRE, DevOps, and infrastructure management.

Our Tech Stack (Must have):

  • CI/CD: GitHub, CI/CD, Jenkins, Terraform, CloudFormation, Containers, Docker
  • Cloud Infrastructure: AWS, Azure
  • Monitoring & Alerting : Datadog, AWS CloudWatch (including canaries and x-ray), Splunk (Enterprise, ITSI and On-Call), New Relic
  • OS: Windows servers, Amazon Linux, Red Hat, Citrix VDI

Certifications

  • AWS Certified SysOps/DevOps Associate or equivalent AWS certification (Required)
  • Splunk Core Certified Certification (Strongly Preferred)
  • Datadog Certification (Strongly Preferred)

Job Type: Full Time (No 1099 or C2C)

Salary: BOE

Benefits:

  • 401(k) with employer contribution
  • Medical/Dental/Vision insurance (option for full coverage for employee)
  • Life, ST/LT insurance
  • Professional development opportunities
  • Company-paid holidays and paid vacation (PTO)

Schedule :

  • 8 hour shift during core business hours
  • May include minimal after hours support depending on on-call schedule

Work Type:

  • Currently hybrid remote in Ellicott City, MD 21043
  • Minimum 2 days in office weekly

Job Tags

Full time, Part time, Work at office, Remote work, Shift work,

Similar Jobs

1st Choice Delivery

IC Delivery Driver-Ashland Job at 1st Choice Delivery

 ...Job Description 1st Choice Delivery is one of the largest final mile providers. Our IC delivery drivers are critical to our company...  ...make. Earnings will be based on your efforts!! As an Independent Contractor you are responsible for all expenses, This includes fuel,... 

KenMor Electric Co., LP

Apprentice Electrician Job at KenMor Electric Co., LP

 ...assigned or requested for the general support of the field organization Requirements Possess Valid State of Texas Apprentice Electrician License and valid or pending B.A.T. certification, levels 1 - 8 Possesses valid Drivers License Ability to work from... 

Prime Staffing

Travel Cath Lab Registered Nurse Job at Prime Staffing

 ...Job Description Prime Staffing is seeking a travel nurse RN Cardiac Cath Lab for a travel nursing job in Morristown, New Jersey....  ...success. We offer a wide range of staffing services including temporary, temp-to-perm, and direct hire placements. Our extensive... 

Bellefaire JCB

Assistant Campus Supervisor Job at Bellefaire JCB

 ...pension plan ~403(b) retirement plan ~ Pet insurance ~ Employer paid life insurance and long-term disability ~ Employee Assistance Program ~ Support for continuing education and credential renewal ~ Ancillary benefits including: dental, vision, voluntary life... 

Holiday Inn Express and Suites

Front Desk Agent Evening Shift Job at Holiday Inn Express and Suites

Front Desk Agent Evening Shift Location Dallas, TX (Far North area) : About Us: At the Holiday Inn Express & Suites Dallas North-Addison...  ...passion for hospitality, we encourage you to apply for this part-time Front Desk Agent position. Join our team at the Holiday Inn...