About the role
The Site Reliability Engineer role evolved from an organizational need to fully embrace cloud infrastructure in dedicated, multi-tenant, and hybrid environments. Specifically, we are searching for someone who brings fresh ideas, demonstrates a unique and informed viewpoint, and enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences at every interaction.
Objectives:
- Run staging and production environments based on measurable metrics form the monitoring systems that allow for performance tuning and fault finding
- Improve reliability and quality of the supported environments by partnering with development teams rigorous tests and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable environments through automate everything approach
- Balance feature development speed and reliability with well-defined service level objectives
- Provide third line of on-call support for issue resolution related to production environments
- Support incident management teams and disaster recovery within an agreed SLA and facilitate post
- Incident analysis
- Maintain systems documentation and knowledge base
Key Skills
- Experience in running and operating Kubernetes in production at scale
- Experience with MySQL databases
- Experience with on-premise systems (physical firewalls, switches, servers, SAN, SAS)
- Knowledge of how to build CI/CD pipelines allowing everyday releases
- Knowledge of the Git version control system
- Working experience on Linux based infrastructure
- Ability to design scalable and highly-available architecture in the cloud (Azure, AWS, GCP)
- Coding experience in any language (e.g. Java, Golang, Python, Bash)
- Experience in setting up monitoring and logging tools (Prometheus, Thanos, Grafana)
- Infrastructure as Code and automation (Terraform, Ansible, Helm)
- Cloud-native approach
- Knowledge of cloud and network security
- Working knowledge of various tools and open-source technologies
- Excellent troubleshooting skills
Nice to have:
- Experience in software development and infrastructure development is a plus