Site Reliability Engineer
Amber Group
Software Engineering
Hong Kong
Posted on Aug 11, 2025
Role and Responsibilities:
- Support and maintain Kubernetes-based infrastructure primarily on AWS EKS
- Build and enhance automation for provisioning, configuration, monitoring, and scaling of cloud-native environments
- Collaborate closely with engineering teams to ensure platform reliability, performance, and operational excellence
- Implement and manage secure processes for data and secret rotation across environments
- Develop tools and practices to improve observability, reliability, and incident response
- Provide technical leadership, mentorship, and promote best practices in Kubernetes, automation, and cloud operations
- Manage project priorities, milestones, and deliverables in a fast-paced environment
Qualifications:
- Deep expertise with Kubernetes (EKS preferred) in production environments
- Strong hands-on experience with AWS services, including IAM, EKS, EC2, S3
- Proficiency in data and secret rotation strategies and tooling
- Proficient in scripting and automation with Python and Bash
- Solid understanding of Linux fundamentals, including OS-level troubleshooting and performance tuning
- Experience with infrastructure as code tools such as Terraform, Helm, or ArgoCD
- Familiarity with container networking, observability tooling, and CI/CD best practices
- Proven ability to architect, develop, and troubleshoot distributed systems
- Strong problem-solving mindset, ownership, and communication skills
- Experience in high-scale, low-latency, or mission-critical environments is a plus