Site-reliability-engineer-II
Department
Open Positions
Location
Education/Qualification
Bachelor’s degree in Computer Science, Information Technology, or a related field.
Years Of Exp
Salary
Posted On
November 28, 2023
Designation
Devops Engineer
Job Description
We are looking for a skilled and motivated DevOps /Site Reliability Engineer with at least 2 years of experience to join our dynamic and fast-paced team. In this role, you will play a crucial part in maintaining the reliability and performance of our systems while also contributing to the automation and improvement of our development and operations processes.
Key Responsibilities:
- System Reliability: Monitor, maintain, and enhance the reliability and availability of our systems, ensuring high uptime and minimal downtime.
- Infrastructure as Code (IaC): Collaborate with the development and operations teams to design, implement, and manage infrastructure as code using tools such as Cloudformation, Terraform, Ansible, or Puppet.
- Automation: Develop and maintain automated deployment pipelines and scripts for continuous integration and continuous deployment (CI/CD) processes to streamline software releases.
- Containerization: Work with containerization technologies like Docker Swarm and AWS ECS to manage and orchestrate application containers.
- Monitoring and Alerting: Set up and maintain monitoring and alerting systems, including tools like Cloudwatch, Datadog, Zenduty, and New Relic, to proactively identify and resolve performance issues.
- Scalability and Performance: Collaborate with the development team to optimize the performance and scalability of applications and infrastructure.
- Security: Implement and maintain security best practices throughout the development and operations pipeline, ensuring the protection of sensitive data.
- Incident Management: Participate in incident response and root cause analysis to resolve critical system outages and develop preventive measures.
- Documentation: Create and update clear and comprehensive documentation related to system architecture, deployment processes, and best practices.
- Collaboration: Work closely with cross-functional teams to facilitate communication and knowledge sharing between development, operations, and other departments.
Qualifications :
- Bachelor’s degree in Computer Science, Information Technology, or a related field.
- A minimum of 2 years of experience in DevOps, System Operations, or Site Reliability Engineering roles.
- Strong knowledge of Linux-based systems and shell scripting.
- Proficiency in AWS cloud platform.
- Understanding of networking, security, and best practices in maintaining secure infrastructure.
- Must have knowledge of ELK and Kafka.
- Experience with containerization technologies (Docker swarm or AWS ECS).
- Hands-on experience with automation and configuration management tools (Cloudformation, Terraform, and Ansible).
- Familiarity with CI/CD pipelines and version control systems (e.g., GitLab CI, Jenkins, Git).
- Working knowledge of MySQL, Postgres, etc.
- Willing and ready to be at L1 as an incident responder (in rotation).
- AWS certification (e.g., AWS Certified DevOps Engineer, AWS Certified Solutions Architect) is a plus.