Our site reliability services for the IBM Cloud  

The Site Reliability Services (SRE) team at Hursley provide Kubernetes for much of the public services in the IBM Cloud. SRE runs the infrastructure and production systems for the service by focusing on automating “toil” work typically done individually by operators, and keeping manual steps to a minimum.

As well as supporting IBM, we also support worldwide clients in transportation, travel, and manufacturing.

Come and talk to us about how we provision, monitor, and control this environment, and the excitement and challenges we face to maintain this essential service. Learn how we have adapted and innovated on tooling and dashboard as key contributors in one of the world's largest cloud providers, increasing reliability and reducing "alert fatigue." Work includes architecting and deploying global networking, developing techniques to automatically detect malicious containers.

We achieve our scale and availability goals by employing resilient architectures, deep automation and ChatOps in our cloud operations, and high availability and efficient scaling of the IBM Cloud container service. The container service offers advanced capabilities for building cloud-native apps, adding DevOps to existing apps, and relieving the pain around security, scale, and infrastructure management.


