More talks in the program:
12:00 - 13:00
When former Google SRE Mikey Dickerson had to explain to outsiders the Google way of increasing a site’s reliability, he came up with his famous “Hierarchy of Service Reliability”, modeled on Maslow’s even more famous “Hierarchy of Needs”. In Dickerson’s hierarchy, the base is monitoring, and the actual product is at the very top, with a number of layers in between. It is one of the most meaningful illustrations in O’Reilly’s book about “Site Reliability Engineering: How Google Runs Production Systems”. SoundCloud’s journey to a reliable site essentially meant climbing up that hierarchy. However, we had to learn – often the hard way – that we could not just copy Google’s SRE practices directly. We found ourselves applying “SRE in spirit” but adjusting the implementation details to the scale and culture of our organization. Since SoundCloud also follows a very radical school of DevOps, our story contains a good amount of productive cross-pollination between SRE and DevOps.