Multi Stage Delivery using GitOps

In multi-stage environments it can be a challenge to see how a particular version of a workload progresses through different stages. This can make it difficult to precisely trace exactly which modification introduced a problem when something goes wrong in one of the deployment stages.

Keptn helps to address this challenge by providing a distributed OpenTelemetry trace that encompasses all deployment stages and contains all relevant information, such as the Git commit ID that triggered the deployment of a workload. For example, if the evaluation of a load test in one of the deployment stages is failing, the distributed trace generated by Keptn contains details about the result of the evaluation, as well as a link to the deployment trace of the previous stage. This makes it easy to trace back the deployment of that particular workload across the previous stages, right until the original commit that resulted in the performance degradation.

This blog post demonstrates an example workflow that automates the promotion of a sample application across two different stages. The deployment traces of those stages are linked together and enriched with valuable metadata, such as the commit ID that triggered the deployment of a new workload version.

Technologies used for this example

For this, we are using the following technologies:

The new KeptnAppContext resource that can pass metadata to the generated deployment traces and define a promotion task that is executed once the application is deployed and all post deployment checks have been executed successfully.
ArgoCD as a GitOps tool. In addition to automatically synchronising the cluster with the desired state of the cluster, ArgoCD also adds metadata (such as the Git commit ID that triggered the last sync) to the KeptnAppContext resource.
GitHub Actions: The GitOps repository is hosted on GitHub so we can use GitHub Actions to implement the promotion of an artifact from one stage to the next. We do this by running a workflow that creates the pull requests for updating the ArgoCD Application resource in the different stages.
Helm: The configuration of the application for each stage is maintained via two separate Helm charts, one for each stage.
OpenTelemetry Collector/Jaeger: The deployment traces are gathered by the OpenTelemetry Collector and forwarded to Jaeger, which displays the generated traces graphically.
Prometheus: Provides monitoring data for the application.

Note that for this blog post, we are assuming that these tools are already installed on the Kubernetes cluster, as going through the installation of each of those would exceed the scope of this blog post.

We are going to do the following:

Set up the environment by:
1. Setting up a GitHub repository with an access token, GitHub, workflows, and GitHub actions
2. Preparing a Kubernetes namespace for each stage (dev and prod)
3. Preparing the ArgoCD Application resources with an appropriate Helm chart for each
4. Applying labels to associate the Deployment resource with the KeptnWorkload resource
5. Defining Keptn pre-/post-deployment checks and tasks
6. Defining the metadata to be passed through the deployment traces
7. Define a TraceParent that links the deployment traces of the prod stage to those of the dev stage
Run the promotion flow by:
1. Creating a pull request to update our dev environment
2. Merging the automatically created pull request to promote the updated version into prod
3. Inspecting the generated deployment traces for both stages and see how they are connected with each other

Setting up the Environment

Now it's time to set up our environment and connect all the tools mentioned above with each other.

Set up the GitHub repository

First things first, since we talk about GitOps in this article, we need a Git repository to host the Helm chart of our application. We use GitHub in this example, which allows us to use GitHub Actions to implement the promotion from dev to production.

In this example, we are using this repository as an upstream repository. If you would like to try the demo yourself, feel free to fork this repository and start experimenting with Keptn from there.

Create personal access token

We need to create a personal access token for accessing the GitHub API. This token will be used by the container running the promotion task during the post-deployment phase of the KeptnApp within the dev stage. The container uses this access token to trigger a GitHub action that creates a pull request to promote the version that has been deployed from dev into production. Using GitHub actions rather than interacting directly with the Git repository in the container that executes the promotion step lets us avoid granting the container any write permissions to the repository.

Instead, we use an access token with a restricted set of permissions, so we can use GitHub's fine-grained access tokens to restrict the permissions to only allow triggering workflow actions, exclusively within our GitOps repository. The required permissions are highlighted in the screenshot below:

Token Permissions

Enable GitHub workflows

We also need to enable GitHub workflows to write to the repo and create pull requests. This is done in the settings of the repository; see the screenshot below:

Workflow Permissions

The GitHub action performing the promotion is implemented in the .github/workflows/promote.yaml file located in our GitOps repository.

name: promote

on:
  workflow_dispatch:
    inputs:
      traceParent:
        description: 'OTEL parent trace'
        required: false
        type: string

permissions:
  contents: write
  pull-requests: write

jobs:
  promote:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: |
          # configure git client
          git config --global user.email "<email address>"
          git config --global user.name "<name>"

          # create a new branch
          git switch -c production/${{ github.sha }}

          # promote the change
          cp dev/values.yaml production/values.yaml

          echo "traceParent: $TRACE_PARENT" >> production/values.yaml

          # push the change to the new branch
          git add production/values.yaml
          git commit -m "Promote dev to production"
          git push -u origin production/${{ github.sha }}
        env:
          TRACE_PARENT: ${{ inputs.traceParent }}
      - run: |
          gh pr create \
            -B main \
            -H production/${{ github.sha }} \
            --title "Promote dev to production" \
            --body "Automatically created by GHA"
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

This action copies the values.yaml file from the dev stage to the prod stage, to set the service versions that should be deployed via the Helm chart for that stage.

Prepare the application namespaces

In this example, the application will be deployed in two different namespaces, each representing a different stage (dev and prod). To create the namespaces, execute the following commands:

kubectl create namespace simple-go
kubectl annotate namespace simple-go keptn.sh/lifecycle-toolkit=enabled
kubectl create namespace simple-go-prod
kubectl annotate namespace simple-go-prod keptn.sh/lifecycle-toolkit=enabled

The promotion task that triggers the action to create a pull request for promoting an ArgoCD Application from dev to production will be executed in the simple-go namespace. Therefore, we need to create a secret containing the GitHub personal access token we created earlier, using the following command:

GH_REPO_OWNER=<YOUR_GITHUB_USER>
GH_REPO=<YOUR_GITHUB_REPO>
GH_API_TOKEN=<YOUR_GITHUB_TOKEN>
kubectl create secret generic github-token -n simple-go --from-literal=SECURE_DATA="{\"githubRepo\":\"${GH_REPO}\",\"githubRepoOwner\":\"${GH_REPO_OWNER}\",\"apiToken\":\"${GH_API_TOKEN}\"}"

Prepare the ArgoCD Application resources

The next step is to create the ArgoCD Application resources in our cluster. Each stage of our application (dev and prod) is represented by a separate ArgoCD Application resource that points to a Helm chart for the respective stage. The Helm charts can be found in our GitOps repository in the following sub folders:

simple-app/chart-dev: Contains the Helm chart for the application in the dev stage
simple-app/chart-prod: Contains the Helm chart for the application in the prod stage

The ArgoCD Applications are created by applying the following manifest:

argo-apps.yaml

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: simple-go-app-context
  namespace: argocd
spec:
  project: default
  source:
    repoURL: 'https://github.com/bacherfl/keptn-analysis-demo'
    path: simple-app/chart-dev
    targetRevision: HEAD
    helm:
      parameters:
        - name: "commitID"
          value: "$ARGOCD_APP_REVISION"
  destination:
    server: 'https://kubernetes.default.svc'
    namespace: simple-go
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: simple-go-app-context-prod
  namespace: argocd
spec:
  project: default
  source:
    repoURL: 'https://github.com/bacherfl/keptn-analysis-demo'
    path: simple-app/chart-prod
    targetRevision: HEAD
    helm:
      parameters:
        - name: "commitID"
          value: "$ARGOCD_APP_REVISION"
  destination:
    server: 'https://kubernetes.default.svc'
    namespace: simple-go
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

This manifest contains the definitions for the two ArgoCD Applications, each of which points to one of the helm charts mentioned earlier. In addition to that, the $ARGOCD_APP_REVISION environment variable is used to get access to the Git commit ID that triggered a new deployment of our ArgoCD Applications. This ID is passed through to the Helm chart. Keptn uses this as metadata for a KeptnApp deployment.

After applying the file, using kubectl apply -f argo-apps.yaml, ArgoCD begins to synchronize the state of the Applications, meaning that the Helm charts for the Applications are applied to the cluster. While this is happening, let's have a closer look at the actual content of the Helm charts.

What's in the Helm chart for dev stage

Each chart contains two Deployments/Services (simple-go-service and simple-go-backend), representing the two KeptnWorkloads that are part of our KeptnApp. Let's take the simple-go-service Deployment as an example to see how we prepared it to be managed by Keptn:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: simple-go-service
  namespace: simple-go
spec:
  selector:
    matchLabels:
      app: simple-go-service
  template:
    metadata:
      labels:
        app: simple-go-service
        app.kubernetes.io/name: simple-go-service
        app.kubernetes.io/part-of: simple-go
        app.kubernetes.io/version: {{.Values.serviceVersion}}
        keptn.sh/post-deployment-tasks: wait-for-monitoring
    spec:
      containers:
        - image: bacherfl/simple-go-service:{{.Values.serviceVersion}}
          imagePullPolicy: Always
          name: simple-go-service
          ports:
            - containerPort: 9000
              name: http
              protocol: TCP

Labels

To correctly associate the Deployment resource with the KeptnWorkload resource, the following labels are set:

app.kubernetes.io/name: The name of the KeptnWorkload that should be associated with the Deployment.
app.kubernetes.io/part-of: The name of the KeptnApp resource containing the two workloads.
app.kubernetes.io/version: The version for the related KeptnWorkload.

For more information about setting these labels for Keptn, see Basic annotations.

Pre and post-deployment tasks

In addition to the labels which define the KeptnWorkload, we also use the keptn.sh/post-deployment-tasks label to define a post-deployment task for the workload. The task defined here (wait-for-monitoring) ensures that the Prometheus target for the workload is available, before proceeding with the execution of the load tests of the overall Keptn application.

KeptnAppContext

The KeptnAppContext provides two important capabilities for multi-stage delivery:

Define tasks and evaluations that run before or after the Keptn application deployment
Add metadata and links to traces for a specific ArgoCD Application. This enables you to enrich your traces with additional information that you can use to understand and analyze the performance of your applications

The KeptnAppContext manifest looks as follows:

apiVersion: lifecycle.keptn.sh/v1beta1
kind: KeptnAppContext
metadata:
  name: simple-go
  namespace: simple-go
spec:
  preDeploymentTasks:
    - wait-for-prometheus
  postDeploymentTasks:
    - post-deployment-loadtests
    - post-deployment-loadtests-backend
  postDeploymentEvaluations:
    - response-time
  promotionTasks:
    - promote
  metadata:
    commitID: {{.Values.commitID}}

This resource contains a list of pre- and post-deployment checks for the complete KeptnWorkload application. In the pre-deployment phase, the task wait-for-monitoring ensures the Prometheus installation in our cluster is available. If this is not the case, it would not be wise to deploy a new version of the application, since we cannot observe the performance metrics of our application.

Once all workloads have been deployed, the application enters the post-deployment phase, in which load tests against the application are executed.

After executing the load tests, a post-deployment evaluation is performed, to compare the response time measured by the load tests with a threshold you have defined.

If this evaluation is successful, the application proceeds into the promotion phase. This is the phase where the GitHub personal access token we created earlier is used to trigger the GitHub action to promote the deployed version into the next stage.

Metadata

In addition to the pre-/post-deployment checks and the promotion task, the KeptnAppContext also contains a metadata property that passes the commitID made available by ArgoCD to the application deployment. This information is then added by Keptn as an attribute to the OpenTelemetry traces created for the application deployment.

To configure the application, the values.yaml file is used. Within that file, the versions for the two workloads that are part of the application are defined, as well as the target response time for the evaluation in the post-deployment phase. The Git commit ID mentioned earlier is also set here. The Git commit ID is empty by default, but is set automatically by ArgoCD, using the $ARGOCD_APP_REVISION environment variable.

serviceVersion: v1
backendServiceVersion: v1
targetResponseTime: "0.50"
commitID: ""

What's in the Helm chart for prod stage

The Helm chart of the prod stage is rather similar to the one for the dev stage, but differs in the values.yaml, and the KeptnAppContext resource. First, let's inspect the values.yaml in prod:

serviceVersion: v1
backendServiceVersion: v1
targetResponseRate: "0.50"
commitID: ""
traceParent: ""

TraceParent property

The values.yaml file for the prod stage contains an additional property called traceParent, which is essential in linking the deployment traces of the prod stage to the previous stage, i.e. the dev stage. The traceParent is propagated from Keptn to the GitHub action that does the promotion by adapting the values.yaml file to specify the workload versions that should be deployed in prod.

spanLinks property

In our example, the value of the traceParent is the span ID of the promotion phase of the dev stage. To pass this property to Keptn, we use the spanLinks property of the KeptnAppContext resource:

apiVersion: lifecycle.keptn.sh/v1beta1
kind: KeptnAppContext
metadata:
  name: simple-go-prod
  namespace: simple-go-prod
spec:

  preDeploymentTasks:
    - wait-for-prometheus
  postDeploymentTasks:
    - post-deployment-loadtests
    - post-deployment-loadtests-backend
  spanLinks:
    - {{.Values.traceParent}}
  metadata:
    commitID: {{.Values.commitID}}

This causes the OpenTelemetry deployment trace in prod to have a reference to the promotion phase in dev, indicating that the successful deployment of the application in dev is what caused the deployment in prod.

Promotion flow from `dev` to `prod`

Now that the GitOps repository and the ArgoCD application are set up, let's have a closer look at how a new service version would make its way into dev and then into prod. To do this, the values.yaml file for the dev stage is edited to change the service version from v1 to v2:

Updating the dev stage

After this change is committed to the GitOps repository, ArgoCD eventually starts to synchronize the application, and the new service version is deployed to dev. This is reflected by a new KeptnAppVersion being created by Keptn, for which the pre-/post-deployment checks and the evaluation that were mentioned earlier are executed. After some time, the new version is up and running in dev and the deployment trace for the new KeptnAppVersion is visible in Jaeger:

Deployment trace dev

You can see that the generated trace also contains the commitID that triggered the deployment (i.e. the commit in which the version was changed). We also see that the promotion phase has been executed successfully, so let's check our GitOps repository and inspect the automatically created pull request to promote the version into the next stage:

PR to update prod

As expected, the pull request updates the values.yaml file for the prod stage to update the serviceVersion to the same value we just deployed in dev. In addition to that, the traceParent property is set to the span ID of the promotion phase of the deployment in dev.

Once the PR is merged, Keptn deploys the new version in the prod stage, and eventually we will see the deployment trace for that stage in Jaeger as well:

Deployment trace prod

As we can see in the deployment trace, we also have the commitID that triggered the deployment in that stage, just like we also had in dev, but the trace also contains a reference to the span ID of the promotion phase in dev. This ultimately allows us to trace back the deployment of a particular service version across multiple stages, right to the commit that introduced a change to the affected service.

Conclusion

Time to wrap up what we have learned in this example. We have seen how the KeptnAppContext resource can be used to define pre-/post-deployment checks and to pass important metadata -- in our example, using ArgoCD, the Git commit ID that triggered a new deployment -- to be added as attributes to the deployment traces generated by Keptn. Then, to gain observability not only for an isolated stage, but across multiple stages, the spanLinks property of the KeptnAppContext was used to create references to deployment traces of a previous stage when promoting a new version of a service from one stage to the next. This way, if any kind of problems appear in one of the later stages (in this example in the prod stage) for a newly deployed version, the links to the deployment traces of the previous stages enable us to trace back the deployment of that new version across the previous stages, until we reach the commit that caused the erroneous behavior of that service.

We hope the example in this blog post gives you some inspiration on how you could implement Keptn into your continuous delivery workflow. If you would like to try out Keptn and its capabilities yourself, feel free to head over to the Keptn docs and follow the guides to install Keptn. We also appreciate any feedback and are always happy to support you with any questions you might have.