Multi Stage Delivery using GitOps
In multi-stage environments it can be a challenge to see how a particular version of a workload progresses through different stages. This can make it difficult to precisely trace exactly which modification introduced a problem when something goes wrong in one of the deployment stages.
Keptn helps to address this challenge by providing a distributed OpenTelemetry trace that encompasses all deployment stages and contains all relevant information, such as the Git commit ID that triggered the deployment of a workload. For example, if the evaluation of a load test in one of the deployment stages is failing, the distributed trace generated by Keptn contains details about the result of the evaluation, as well as a link to the deployment trace of the previous stage. This makes it easy to trace back the deployment of that particular workload across the previous stages, right until the original commit that resulted in the performance degradation.
This blog post demonstrates an example workflow that automates the promotion of a sample application across two different stages. The deployment traces of those stages are linked together and enriched with valuable metadata, such as the commit ID that triggered the deployment of a new workload version.
Technologies used for this example
For this, we are using the following technologies:
- The new KeptnAppContext
resource
that can pass metadata to the generated deployment traces
and define a
promotion
task that is executed once the application is deployed and all post deployment checks have been executed successfully. - ArgoCD as a GitOps tool.
In addition to automatically synchronising the cluster with the desired
state of the cluster, ArgoCD also adds metadata (such as the Git commit ID
that triggered the last sync) to the
KeptnAppContext
resource. - GitHub Actions: The GitOps
repository is hosted on GitHub so we can use GitHub Actions
to implement the promotion of an artifact from one stage to the next.
We do this by running a workflow that creates the pull requests for updating the
ArgoCD
Application
resource in the different stages. - Helm: The configuration of the application for each stage is maintained via two separate Helm charts, one for each stage.
- OpenTelemetry Collector/Jaeger: The deployment traces are gathered by the OpenTelemetry Collector and forwarded to Jaeger, which displays the generated traces graphically.
- Prometheus: Provides monitoring data for the application.
Note that for this blog post, we are assuming that these tools are already installed on the Kubernetes cluster, as going through the installation of each of those would exceed the scope of this blog post.
We are going to do the following:
- Set up the environment by:
- Setting up a GitHub repository with an access token, GitHub, workflows, and GitHub actions
- Preparing a Kubernetes namespace for each stage (
dev
andprod
) - Preparing the ArgoCD
Application
resources with an appropriate Helm chart for each - Applying labels to associate the
Deployment
resource with theKeptnWorkload
resource - Defining Keptn pre-/post-deployment checks and tasks
- Defining the metadata to be passed through the deployment traces
- Define a TraceParent that links the deployment traces of the
prod
stage to those of thedev
stage
- Run the promotion flow by:
- Creating a pull request to update our
dev
environment - Merging the automatically created pull request to promote the updated version into
prod
- Inspecting the generated deployment traces for both stages and see how they are connected with each other
- Creating a pull request to update our
Setting up the Environment
Now it's time to set up our environment and connect all the tools mentioned above with each other.
Set up the GitHub repository
First things first, since we talk about GitOps in this article, we need
a Git repository to host the Helm chart of our application.
We use GitHub in this example, which allows us
to use GitHub Actions to implement the promotion from
dev
to production
.
In this example, we are using this repository as an upstream repository. If you would like to try the demo yourself, feel free to fork this repository and start experimenting with Keptn from there.
Create personal access token
We need to create a personal access token for accessing the GitHub API.
This token will be used by the container running the promotion
task during the post-deployment phase of the KeptnApp
within
the dev
stage.
The container uses this access token to trigger a GitHub action
that creates a pull request to promote the version that has
been deployed from dev
into production
.
Using GitHub actions rather than interacting directly with the Git repository in the container that executes the promotion
step lets us avoid granting the container any write permissions to the
repository.
Instead, we use an access token with a restricted set of permissions, so we can use GitHub's fine-grained access tokens to restrict the permissions to only allow triggering workflow actions, exclusively within our GitOps repository. The required permissions are highlighted in the screenshot below:
Enable GitHub workflows
We also need to enable GitHub workflows to write to the repo and create pull requests. This is done in the settings of the repository; see the screenshot below:
The GitHub action performing the promotion is implemented
in the .github/workflows/promote.yaml
file
located in our GitOps repository.
name: promote
on:
workflow_dispatch:
inputs:
traceParent:
description: 'OTEL parent trace'
required: false
type: string
permissions:
contents: write
pull-requests: write
jobs:
promote:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: |
# configure git client
git config --global user.email "<email address>"
git config --global user.name "<name>"
# create a new branch
git switch -c production/${{ github.sha }}
# promote the change
cp dev/values.yaml production/values.yaml
echo "traceParent: $TRACE_PARENT" >> production/values.yaml
# push the change to the new branch
git add production/values.yaml
git commit -m "Promote dev to production"
git push -u origin production/${{ github.sha }}
env:
TRACE_PARENT: ${{ inputs.traceParent }}
- run: |
gh pr create \
-B main \
-H production/${{ github.sha }} \
--title "Promote dev to production" \
--body "Automatically created by GHA"
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
This action copies the values.yaml
file from the
dev
stage to the prod
stage, to set the
service versions that should be deployed via the Helm chart for
that stage.
Prepare the application namespaces
In this example, the application will be deployed in
two different namespaces, each representing a different
stage (dev
and prod
).
To create the namespaces, execute the following commands:
kubectl create namespace simple-go
kubectl annotate namespace simple-go keptn.sh/lifecycle-toolkit=enabled
kubectl create namespace simple-go-prod
kubectl annotate namespace simple-go-prod keptn.sh/lifecycle-toolkit=enabled
The promotion task that triggers the action to
create a pull request for promoting an ArgoCD Application
from dev
to production
will be executed in the simple-go
namespace.
Therefore, we need to create a secret containing the GitHub personal
access token we created earlier, using the following command:
GH_REPO_OWNER=<YOUR_GITHUB_USER>
GH_REPO=<YOUR_GITHUB_REPO>
GH_API_TOKEN=<YOUR_GITHUB_TOKEN>
kubectl create secret generic github-token -n simple-go --from-literal=SECURE_DATA="{\"githubRepo\":\"${GH_REPO}\",\"githubRepoOwner\":\"${GH_REPO_OWNER}\",\"apiToken\":\"${GH_API_TOKEN}\"}"
Prepare the ArgoCD Application resources
The next step is to
create the ArgoCD
Application
resources in our cluster.
Each stage of our application (dev
and prod
) is
represented by a separate ArgoCD Application
resource
that points to a Helm chart for the respective stage.
The Helm charts can be found in our
GitOps repository
in the following sub folders:
simple-app/chart-dev
: Contains the Helm chart for the application in thedev
stagesimple-app/chart-prod
: Contains the Helm chart for the application in theprod
stage
The ArgoCD Applications
are created by applying the following manifest:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: simple-go-app-context
namespace: argocd
spec:
project: default
source:
repoURL: 'https://github.com/bacherfl/keptn-analysis-demo'
path: simple-app/chart-dev
targetRevision: HEAD
helm:
parameters:
- name: "commitID"
value: "$ARGOCD_APP_REVISION"
destination:
server: 'https://kubernetes.default.svc'
namespace: simple-go
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: simple-go-app-context-prod
namespace: argocd
spec:
project: default
source:
repoURL: 'https://github.com/bacherfl/keptn-analysis-demo'
path: simple-app/chart-prod
targetRevision: HEAD
helm:
parameters:
- name: "commitID"
value: "$ARGOCD_APP_REVISION"
destination:
server: 'https://kubernetes.default.svc'
namespace: simple-go
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
This manifest contains the definitions for the two ArgoCD Applications
,
each of which points to one of the helm charts mentioned earlier.
In addition to that, the $ARGOCD_APP_REVISION
environment variable
is used to get access to the Git commit ID that triggered
a new deployment of our ArgoCD Applications
.
This ID is passed through to the Helm chart.
Keptn uses this as metadata for a KeptnApp
deployment.
After applying the file, using kubectl apply -f argo-apps.yaml
,
ArgoCD begins to synchronize the state of the Applications
,
meaning that the Helm charts for the Applications
are applied to the
cluster.
While this is happening, let's have a closer look at the actual
content of the Helm charts.
What's in the Helm chart for dev stage
Each chart contains two Deployments/Services
(simple-go-service
and simple-go-backend
), representing
the two KeptnWorkloads
that are part of our KeptnApp
.
Let's take the simple-go-service
Deployment
as an example
to see how we prepared it to be managed by Keptn:
apiVersion: apps/v1
kind: Deployment
metadata:
name: simple-go-service
namespace: simple-go
spec:
selector:
matchLabels:
app: simple-go-service
template:
metadata:
labels:
app: simple-go-service
app.kubernetes.io/name: simple-go-service
app.kubernetes.io/part-of: simple-go
app.kubernetes.io/version: {{.Values.serviceVersion}}
keptn.sh/post-deployment-tasks: wait-for-monitoring
spec:
containers:
- image: bacherfl/simple-go-service:{{.Values.serviceVersion}}
imagePullPolicy: Always
name: simple-go-service
ports:
- containerPort: 9000
name: http
protocol: TCP
Labels
To correctly associate the Deployment
resource with the KeptnWorkload
resource,
the following labels are set:
app.kubernetes.io/name
: The name of theKeptnWorkload
that should be associated with theDeployment
.app.kubernetes.io/part-of
: The name of theKeptnApp
resource containing the two workloads.app.kubernetes.io/version
: The version for the relatedKeptnWorkload
.
For more information about setting these labels for Keptn, see Basic annotations.
Pre and post-deployment tasks
In addition to the labels which define the KeptnWorkload
, we also use
the keptn.sh/post-deployment-tasks
label
to define a post-deployment task for the workload.
The task defined here (wait-for-monitoring
) ensures that the Prometheus
target for the workload is available, before proceeding with
the execution of the load tests of the overall Keptn application.
KeptnAppContext
The KeptnAppContext
provides two important capabilities for multi-stage delivery:
- Define tasks and evaluations that run before or after the Keptn application deployment
- Add metadata and links to traces for a specific ArgoCD
Application
. This enables you to enrich your traces with additional information that you can use to understand and analyze the performance of your applications
The KeptnAppContext
manifest looks as follows:
apiVersion: lifecycle.keptn.sh/v1beta1
kind: KeptnAppContext
metadata:
name: simple-go
namespace: simple-go
spec:
preDeploymentTasks:
- wait-for-prometheus
postDeploymentTasks:
- post-deployment-loadtests
- post-deployment-loadtests-backend
postDeploymentEvaluations:
- response-time
promotionTasks:
- promote
metadata:
commitID: {{.Values.commitID}}
This resource contains a list of pre- and post-deployment checks
for the complete KeptnWorkload
application.
In the pre-deployment
phase, the task wait-for-monitoring
ensures the Prometheus installation in our cluster is available.
If this is not the case, it would not be wise to deploy a new
version of the application, since we cannot observe the
performance metrics of our application.
Once all workloads have been deployed, the application enters the
post-deployment
phase, in which load tests against the application are
executed.
After executing the load tests, a post-deployment
evaluation is
performed, to compare the response time measured by the load tests with a threshold you have defined.
If this evaluation is successful, the application proceeds into the
promotion
phase.
This is the phase where the GitHub personal access token we created earlier
is used to trigger the GitHub action to promote the deployed version
into the next stage.
Metadata
In addition to the pre-/post-deployment checks and the promotion task,
the KeptnAppContext
also contains a metadata
property that
passes the commitID
made available by ArgoCD to the
application deployment.
This information is then added by Keptn as an attribute to the
OpenTelemetry traces created for the application deployment.
To configure the application, the values.yaml
file is used.
Within that file, the versions for the two workloads that
are part of the application are defined,
as well as the target response time for the evaluation
in the post-deployment phase.
The Git commit ID mentioned earlier is also set here.
The Git commit ID is empty by default, but is set automatically
by ArgoCD, using the $ARGOCD_APP_REVISION
environment variable.
What's in the Helm chart for prod stage
The Helm chart of the prod
stage is rather similar to the one
for the dev
stage, but differs in the values.yaml
, and the
KeptnAppContext
resource.
First, let's inspect the values.yaml
in prod
:
serviceVersion: v1
backendServiceVersion: v1
targetResponseRate: "0.50"
commitID: ""
traceParent: ""
TraceParent property
The values.yaml
file for the prod
stage
contains an additional property called traceParent
,
which is essential in linking the deployment traces of the
prod
stage to the previous stage, i.e. the dev
stage.
The traceParent
is propagated from Keptn to the GitHub action that
does the promotion by adapting the values.yaml
file to
specify the workload versions that should be deployed in prod
.
spanLinks property
In our example, the value of the traceParent
is the span ID of the
promotion
phase of the dev
stage.
To pass this property to Keptn, we use the spanLinks
property of the KeptnAppContext
resource:
apiVersion: lifecycle.keptn.sh/v1beta1
kind: KeptnAppContext
metadata:
name: simple-go-prod
namespace: simple-go-prod
spec:
preDeploymentTasks:
- wait-for-prometheus
postDeploymentTasks:
- post-deployment-loadtests
- post-deployment-loadtests-backend
spanLinks:
- {{.Values.traceParent}}
metadata:
commitID: {{.Values.commitID}}
This causes the OpenTelemetry deployment trace in prod
to have a reference
to the promotion
phase in dev
, indicating that the successful deployment
of the application in dev
is what caused the deployment in prod
.
Promotion flow from dev
to prod
Now that the GitOps repository and the ArgoCD application are set up,
let's have a closer look at how a new service version would make its way
into dev
and then into prod
.
To do this, the values.yaml
file for the dev
stage is edited to
change the service version from v1
to v2
:
After this change is committed to the GitOps repository, ArgoCD
eventually starts to synchronize the application, and the new service
version is deployed to dev
.
This is reflected by a new KeptnAppVersion
being created by Keptn,
for which the pre-/post-deployment checks
and the evaluation that were mentioned earlier are executed.
After some time, the new version is up and running in dev
and the deployment trace for the new KeptnAppVersion
is
visible in Jaeger:
You can see that the generated trace also contains the commitID that triggered
the deployment (i.e. the commit in which the version was changed).
We also see that the promotion
phase has been executed successfully, so let's
check our GitOps repository and inspect the automatically created pull request
to promote the version into the next stage:
As expected, the pull request updates the values.yaml
file for the
prod
stage to update the serviceVersion
to the same value we just
deployed in dev
.
In addition to that, the traceParent
property is set to the
span ID of the promotion
phase of the deployment in dev
.
Once the PR is merged, Keptn deploys
the new version in the prod
stage, and eventually
we will see the deployment trace for that stage in
Jaeger as well:
As we can see in the deployment trace, we also have the commitID that triggered
the deployment in that stage, just like we also had in dev
,
but the trace also contains a reference
to the span ID of the promotion
phase in dev
.
This ultimately allows us to trace back the deployment of a particular
service version across multiple stages, right to the commit that
introduced a change to the affected service.
Conclusion
Time to wrap up what we have learned in this example.
We have seen how the KeptnAppContext
resource
can be used to define pre-/post-deployment checks and to
pass important metadata -- in our example, using ArgoCD,
the Git commit ID that triggered a new deployment --
to be added as attributes to the deployment traces
generated by Keptn.
Then, to gain observability not only for an isolated stage,
but across multiple stages, the spanLinks
property
of the KeptnAppContext
was used to create references to
deployment traces of a previous stage when
promoting a new version of a service from one stage to the next.
This way, if any kind of problems appear in one of the later
stages (in this example in the prod
stage) for a newly deployed version,
the links to the deployment traces of the previous stages
enable us to trace back the deployment of that new version
across the previous stages, until we reach the commit that
caused the erroneous behavior of that service.
We hope the example in this blog post gives you some inspiration on how you could implement Keptn into your continuous delivery workflow. If you would like to try out Keptn and its capabilities yourself, feel free to head over to the Keptn docs and follow the guides to install Keptn. We also appreciate any feedback and are always happy to support you with any questions you might have.