Deploy a production-ready Kubernetes Cluster on Azure with Terraform

11 min read · Sep 20, 2025

In this guide, you'll learn to create a fully modular and reusable Terraform solution, deploying resources across Azure, Kubernetes, and Cloudflare.

In my previous article, you learned how to set up a Kubernetes cluster and run Plausible Analytics using a series of CLI commands. While that approach works, it isn't ideal. A better, more sustainable solution is to use Terraform. With Terraform, you describe your infrastructure in its desired state, and Terraform figures out the steps required to get there.

I've also used helmfile before, which is great for managing Helm releases. But what I like about Terraform is that it goes further. It doesn't stop at Kubernetes, it gives you a single solution to define all resources across your stack.

Key Benefits

Define desired state - No more manual CLI scripts or configuration drift (your environment always represents your code)
Recoverable - Since you've defined your desired state, it's easy to recover an entire environment with all its settings
Modular Design - Create modules that can be reused across different environments
Environment agnostic - Deploy to Azure, Kubernetes, Cloudflare, and many more platforms
Version control - Track infrastructure changes in version control, maintaining a complete audit trail

Setting up your solution as described in this article extends those benefits with:

Automatic HTTPS - Let's Encrypt certificates for all your exposed applications
Cost optimized - Only a single public IP, a cost-effective VM configuration and no extra VM disks
No configuration drift - infrastructure state centrally stored
Data safety - Azure disks store your data, enabling easy backup and restore via snapshots or backup-vault

Prerequisites

Ensure the following tools are installed on your machine: Azure CLI, Terraform & Kubernetes tools. Or alternatively use Cloud Shell in Azure Portal, where all needed tools are pre-installed. When using Cloud Shell, do mount the clouddrive to persist everything across shell sessions: clouddrive mount & cd clouddrive.

If you're on Windows, please use Cloud Shell or WSL, as all scripts are written in bash.

Getting Started

There's one small manual step to take before we can dive into Terraform. Our solution needs a backend (storage) for the state file, so we first need to create a Storage Account with a container in Azure. There's a handy script in our Terraform project that does this for you. So let's get our Terraform project first.

You can do this quickly by running the following shell script:

download-terraform-project.sh

#!/usr/bin/env bash
# Download and extract the terraform project from the repository
curl -L "https://github.com/jeroenbach/bach.software/archive/refs/heads/main.zip" -o "bach.software-terraform.zip"
unzip -q "bach.software-terraform.zip" "bach.software-main/src/app/examples/post4/terraform/*"

# Move the extracted folder to current directory and remove the zip and extracted folder
mv "bach.software-main/src/app/examples/post4/terraform" "./terraform"
rm -rf "bach.software-terraform.zip" "bach.software-main"

# Navigate to the terraform directory
cd "./terraform"

Before running Terraform or any of the next scripts, always make sure you're logged in to the correct Azure subscription using az login.

Let's create the storage account and container for our Terraform state:

az login
./scripts/create-tfstate-storage.sh

Running Terraform

When running terraform apply, Terraform will ask for some input variables. You can find the needed values in the input-output.tf file in the same folder. You can also create a terraform.tfvars file with the needed values, so you don't have to enter them each time.

Note: Make sure not to check in your .tfvars files to version control

terraform.tfvars

azure_subscription_id = "<azure-subscription-id>"
azure_cluster_name    = "aks-westeu-prod"
cloudflare_api_token  = "<cloudflare-api-token>"
cloudflare_zone_id    = "<cloudflare-zone-id>"
plausible_dns         = "plausible.example.com"
letsencrypt_email     = "[email protected]"

Before starting, make sure you have the required information available. You can create a free Cloudflare account and link it to a DNS you own, or create a new DNS. You can create an API token following these instructions and use the "Edit zone DNS" template. You can find your zone ID following these instructions. You can find your Azure subscription ID following these instructions.

If you don't specify the Cloudflare variables, the DNS won't be updated, but everything else will still work and you'll be shown the IP address (to use to access Plausible) at the end. You do need to create a DNS record with this IP address yourself, as the certificate issuer needs it to validate the DNS record before it can issue a valid certificate.

Now, let's deploy your environment:

# Environment name: Azure Kubernetes Service - Western Europe - Production
cd aks-westeu-prod
terraform init
terraform apply

Terraform will:

Deploy the AKS cluster
Install Plausible via Helm
Update Cloudflare DNS

Backup & Restore (Optional)

In your rg-nodes-aks-westeu-prod resource group, you'll find the two Azure disks that contain all the data of the Plausible solution: pv-disk-plausible-analytics-v3-clickhouse-0 & pv-disk-plausible-analytics-v3-postgresql-0. You can create hourly, daily, or weekly backups of those disks using Azure Backup Vault.

To restore a backup, create a snapshot of the specific backup to a resource group and fill in the snapshot IDs in the following variables found in the aks-westeu-prod/input-output.tf file: postgresql_restore_snapshot_id and clickhouse_restore_snapshot_id.

Next time you run terraform apply, Plausible will be restored with the backups.

Destroying the Environment

To destroy the environment and all associated resources, you can run the following command:

terraform destroy

Solution Structure

To make the solution run from beginning to end, there were some hurdles to overcome. In this chapter, I'll examine those hurdles and how I've solved them, but first let me provide a general overview of the solution.

terraform/
├── aks-westeu-prod/
│   ├── app-plausible.tf
│   ├── aks-cluster.tf
│   └── provider.tf
├── helm-charts/
│   ├── letsencrypt-cert-issuer/
│   │   ├── templates/
│   │   │   ├── letsencrypt-cluster-issuer-staging.yaml
│   │   │   └── letsencrypt-cluster-issuer.yaml
│   │   ├── Chart.yaml
│   │   └── values.yaml
├── modules/
│   ├── aks-cluster/
│   │   ├── aks-cluster.tf
│   │   ├── ingress-and-certificates.tf
│   │   └── input-output.tf
│   ├── persistent-azure-disk-volume/
│   │   ├── input.tf
│   │   └── persistent-azure-disk-volume.tf
│   └── plausible/
│   │   ├── disks.tf
│   │   ├── input.tf
│   │   ├── namespace.tf
│   │   └── plausible.tf
├── scripts/
│   ├── create-tfstate-storage.sh
│   └── download-terraform-project.sh

aks-westeu-prod: A production environment configuration for deploying to Azure West Europe. You can use this folder as a template to create more environments. The files prefixed with app- show the different applications installed in the cluster.
helm-charts: Custom Helm charts
- letsencrypt-cert-issuer: Instead of deploying the ClusterIssuer resources separately, I packaged them in a Helm chart
modules: Each module encapsulates a specific responsibility
- aks-cluster: Deploys an AKS cluster with Let's Encrypt certificate issuer, nginx ingress as load balancer, and waits for the public IP to be available
- persistent-azure-disk-volume: Creates an Azure disk or restores one using a snapshot and then creates a persistent volume and persistent volume claim in Kubernetes
- plausible: Installs Plausible and its dependencies via Helm

Hurdle: Connection details of the new cluster not yet available

After creating the Kubernetes cluster, we want to be able to deploy resources to it. But at the Terraform plan stage, information on how to connect to this new environment is not yet available. Therefore, we had to take two steps to create a seamless deployment.

Dynamic Provider Configuration: The AKS cluster's information is dynamically set for the Helm and Kubernetes providers by retrieving the connection information from the newly created cluster:

aks-westeu-prod/provider.tf

provider "helm" {
  kubernetes = {
    # Use dynamic provider configuration to use the newly created cluster directly
    host                   = module.aks_cluster.kube_config.host
    client_certificate     = base64decode(module.aks_cluster.kube_config.client_certificate)
    client_key             = base64decode(module.aks_cluster.kube_config.client_key)
    cluster_ca_certificate = base64decode(module.aks_cluster.kube_config.cluster_ca_certificate)
  }
}

provider "kubernetes" {
  # Use dynamic provider configuration to use the newly created cluster directly
  host                   = module.aks_cluster.kube_config.host
  client_certificate     = base64decode(module.aks_cluster.kube_config.client_certificate)
  client_key             = base64decode(module.aks_cluster.kube_config.client_key)
  cluster_ca_certificate = base64decode(module.aks_cluster.kube_config.cluster_ca_certificate)
}

Set the local kubectl context: After the AKS cluster is created, we write the new kube config and set the kubectl context on the local machine, this way local-exec commands can immediately connect to the new cluster.

modules/aks-cluster/aks-cluster.tf

resource "null_resource" "set_kube_context" {
  provisioner "local-exec" {
    command = <<EOT
      # We get it from the Terraform state and add it to the kubeconfig
      echo '${azurerm_kubernetes_cluster.aks_cluster.kube_config_raw}' > ~/.kube/config
      export KUBECONFIG=~/.kube/config
      kubectl config use-context ${azurerm_kubernetes_cluster.aks_cluster.name}
    EOT
  }

  // Always set the kube context when running apply, even if no changes were made to the cluster
  triggers = {
    always_run = "${timestamp()}"
  }

  depends_on = [azurerm_kubernetes_cluster.aks_cluster]
}

Hurdle: Load Balancer IP not yet available

When deploying a helm release, terraform finishes before the release is completely deployed. It also doesn't provide the load balancer IP information. Therefore I implemented two local scripts that wait for the nginx ingress deployment and collect the load balancer IP, which is needed to update your DNS.

modules/aks-cluster/ingress-and-certificates.tf

# Wait for the ingress-nginx helm release to be deployed
resource "null_resource" "wait_for_ingress_nginx" {
  provisioner "local-exec" {
    command = <<EOT
      for i in {1..30}; do
        kubectl get svc -n ingress-nginx ${helm_release.ingress_nginx.name}-controller && sleep 30 && break || sleep 30;
      done
    EOT
  }

  depends_on = [helm_release.ingress_nginx]
}

# Get external IP using kubectl
data "external" "ingress_external_ip" {
  program = ["bash", "-c", <<EOT
    EXTERNAL_IP=$(kubectl get svc -n ingress-nginx ${helm_release.ingress_nginx.name}-controller -o jsonpath='{.status.loadBalancer.ingress[0].ip}' 2>/dev/null || echo "")
    echo "{\"ip\":\"$EXTERNAL_IP\"}"
  EOT
  ]

  depends_on = [null_resource.wait_for_ingress_nginx]
}

Hurdle: The data in our solution is not safe

When using the plausible helm chart, it creates two databases: PostgreSQL and ClickHouse. By default, these databases use ephemeral storage, which means that when the pod is deleted or rescheduled, all data is lost. To make sure our data is safe, we need to use persistent storage. In a cloud environment like Azure, we can use Azure Disks for this.

I've created a module to create or restore an Azure disk and hook it into Kubernetes by creating a persistent volume and persistent volume claim.

This is how you can use the module and link it into your Plausible Helm deployment.

module "create_pv_postgresql" {
  source                    = "../persistent-azure-disk-volume"
  snapshot_id               = var.postgresql_restore_snapshot_id
  azure_location            = var.azure_disk_location
  pvc_namespace             = var.namespace
  pv_name                   = "pv-disk-${var.name}-postgresql-0"
  pvc_name                  = "pvc-disk-${var.name}-postgresql-0"
  azure_resource_group_name = var.azure_disk_resource_group_name
  disk_size_gb              = var.plausible_config_disk_size # Keep this equal to the size defined in the plausible helm chart

  depends_on = [kubernetes_namespace.plausible_analytics]
}

# the existingClaim is set to the pvc_name for both postgresql and clickhouse
postgresql:
  primary:
    persistence:
      enabled: true
      existingClaim: pvc-disk-${var.name}-postgresql-0
      size: ${var.plausible_config_disk_size}Gi # This database is only used for settings and user data, so it doesn't need to be very large

...

clickhouse:
  persistence:
    enabled: true
    existingClaim: pvc-disk-${var.name}-clickhouse-0
    size: ${var.plausible_data_disk_size}Gi # This database is used for storing all the analytics data, so it needs to be larger

Hurdle: The Plausible Helm Release is not exposed

When deploying Plausible via Helm, it doesn't expose the service by default. To make it accessible from the internet, we need to configure an ingress resource. When configuring the ingress, we can also specify the cert-manager annotation to ensure the certificate is created.

modules/plausible/plausible.tf

ingress:
  enabled: true
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-production"
    kubernetes.io/ingress.class: nginx
    kubernetes.io/tls-acme: "true"
  className: nginx
  hosts:
    - ${var.plausible_dns}
  path: /
  pathType: Prefix
  tls:
    - secretName: letsencrypt-production
      hosts:
        - ${var.plausible_dns}

Hurdle: Can't restore plausible from a backup when the environment already exists

When changing the snapshot IDs in the terraform.tfvars file, Terraform doesn't recreate the plausible helm release or persistent volume claims, because it doesn't see any changes in them. This prevents the deletion & recreation of the azure disks and persistent volumes, because they're never unbound. I therefore added a null_resource that triggers a replacement of the plausible release and the persistent volume claims when the snapshot IDs change. This way you can specify new snapshot IDs and have the resources recreated without manual intervention.

resource "null_resource" "snapshot_trigger" {
  triggers = {
    postgresql_snapshot = var.postgresql_restore_snapshot_id
    clickhouse_snapshot = var.clickhouse_restore_snapshot_id
  }
}
...

  lifecycle {
    replace_triggered_by = [
      null_resource.snapshot_trigger
    ]
  }

resource "null_resource" "snapshot_trigger" {
  triggers = {
    snapshot = var.snapshot_id
  }
}
...

  lifecycle {
    replace_triggered_by = [
      null_resource.snapshot_trigger
    ]
  }

Hurdle: unnecessary costs in our AKS cluster

In my previous article Ditching the Cookie Banners: Run Plausible Analytics on Azure Kubernetes, you learned a few tricks to reduce the costs of your AKS cluster. These are incorporated in this solution as well.

Use ephemeral disks: These are stored directly on the VM's local storage and come at no additional cost.
Standard_B2s configuration: The most cost-effective VM configuration available
Increase the number of pods per node: To allow more workloads on the Standard_B2s instance

Final Thoughts

We've successfully transformed the bash scripts from our previous article into a production-grade Kubernetes deployment on Azure. By leveraging Terraform's declarative approach and AKS's managed infrastructure, you now have a Plausible Analytics instance that's not just running—it's scalable, maintainable, and ready for real-world traffic.

The beauty of this Infrastructure as Code approach is in its repeatability. Need a staging environment? Just duplicate the aks-westeu-prod folder with different variables. Want to deploy to another region? Change a single parameter. Every infrastructure decision is documented in code, reviewed through pull requests, and rolled back if needed.

While this setup might seem like overkill for a simple analytics tool, the patterns you've learned here (modularized Terraform, cert-manager integration, proper secret management) will serve you well for any production Kubernetes workload.