encomium to site reliability engineering

The charter of a site reliability engineer is to reduce toil. We "hold the phone" - when something goes wrong, we get called. We also solve for slack - our need is felt when everything is always on fire, or when nobody can overcome the friction of a system to actually do anything meaningful. We are meant to learn scientifically and solve strategically. There is an agentic quality to the role.

There is no other way than holding the phone to understand the peculiar, illegible problems that arise at the leading edge of any sufficiently complex system: the work that is just complex enough that automating it takes more effort than it it's worth, at least as best as anyone can tell on short timescales. The role exists because that illegible work done at the edge is a goldmine for insight - reality is surprisingly detailed, and its detail betrays otherwise inaccessibler truths about its deep nature. Doing engineering at that edge involves learning scientifically and solving strategically.