In the dynamic world of DevOps and continuous delivery, keeping applications reliable and high-performing is a top priority.
Site reliability engineers (SREs) rely on Service Level Objectives (SLOs) to set the standards that the Service Level Indicators (SLIs) of an application must meet, like response time, error rate, or any other metric that might be relevant to the application.
The use of SLOs is not a new concept, but integrating them into an application comes with its own set of issues:
- Figuring out which SLIs and SLOs to use— do you get the SLI values from one monitoring source or multiple? This complexity makes it harder to use them effectively.
Defining SLO priorities. Imagine a new version of a service that fixes a concurrency problem but slows down response time. In this case, this may be a valid trade-off and the new version should not be denied due to an increase in the response time, given that the error rate will decrease. Situations like these call for a way of defining a grading logic where different priorities can be assigned to SLOs.
Defining and storing SLOs. It's crucial to clearly define and store these goals in one central place, ideally a declarative resource in a GitOps repository, where each change can be easily traced back.
In this article, we'll explore how Keptn tackles these challenges with its new Analysis feature. We will deploy a demo application onto a Kubernetes cluster to show Keptn helps SREs gather and make sense of SLOs, making the whole process more straightforward and efficient.
The example application will provide some metrics by itself by serving them via its Prometheus endpoint, while other data will come from Dynatrace.