We are growing and our Operations Department is looking for support to join our international team!
Responsibilities
Daily interactions ensuring the health and maintenance of systems in different geographical locations: hardware, software, application and network are operating at peak performance
Perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes
Troubleshoot issues across the entire stack: hardware, software, application and network
Drive standardization efforts across multiple disciplines and services in conjunction with SREs throughout the organization
Identify and drive opportunities to improve automation for the company; scope and create automation for deployment, management and visibility of our services
Represent the SRE organization in design reviews and operational readiness exercises for new and existing services
Work with software engineers to improve upon deployment processes
Participate in the on-call rotation for production systems
Requirements
Sound fundamentals in operating systems, networking, and distributed systems
Strong familiarity with Linux systems administration and management best practices
Familiarity with container technologies: Kubernetes, CRI, Docker, namespaces, cgroups