Setting up Airflow on your Minikube Cluster using Helm

2023-02-06

Setting up Airflow on your Minikube Cluster using Helm

Prerequisites

  • Basic Understanding of Kubernetes and Helm
  • Basic Understanding of Apache Airflow and its components

Set up the following on your local system

Why Helm

  • Helm provides automated life cycle hooks such as installing, upgrading and rolling back applications.
  • Ensures consistency between the local and live states which prevents users from having to manage an application and its dependencies manually.
  • Helm alleviates the issues with Kubernetes static resource files that cannot be parameterized by introducing values and templates.
    The ability to dynamically generate declarative resource files makes it simpler to create YAML-based resources while still ensuring that applications are created in an easily reproducible manner.

Helm Chart for Apache Airflow

To set up Airflow on your local Minikube Cluster, follow the steps below

Bake DAGs in Docker image

  • Create an folder dags and add a file named simple-dag.py with the code below
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from datetime import datetime
from airflow import DAG
from airflow.operators.python_operator import PythonOperator


def add_two_numbers():
x = 6 + 10
return x


dag = DAG('add_num',
description='Add two integers',
schedule_interval='0 12 * * *',
start_date=datetime(2022, 6, 23),
catchup=False)

add_operator = PythonOperator(task_id='add_num_task',
python_callable=add_two_numbers,
dag=dag)

add_operator

add_operator


  • Create a Dockerfile and add the code below
1
2
FROM apache/airflow:2.3.0
COPY ./dags/ /opt/airflow/dags
  • Create an account on Docker Hub
  • Create a new repository , I am going to name mine airflow-dag
  • Build and push the DAG Container
1
2
3
docker login
docker build -t lindaindicina/airflow-dag .
docker push lindaindicina/airflow-dag

Start Minikube Server

  • Create a file custom-values.yml and add the following code below
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
images:
airflow:
repository: "lindaindicina/airflow-dag"
pullPolicy: IfNotPresent
tag: ""

webserverSecretKey: f50c34d5169d5aa3161684bc8bc6cb67

webserver:
replicas: 1
service:
type: NodePort
defaultUser:
enabled: true
email: oluchi@fakeemail.com
role: Admin
username: airflow
password: airflow

flower:
enabled: true

dags:
persistence:
# Enable persistent volume for storing dags
enabled: true
# Volume size for dags
size: 1Gi
# If using a custom storageClass, pass name here
storageClassName: "default"
# access mode of the persistent volume
accessMode: ReadWriteOnce
## the name of an existing PVC to use
  • Run the following commands on your terminal
1
2
3
4
5
6
minikube start
export RELEASE_NAME=local-airflow
export KUBE_NAMESPACE=airflow
kubectl create namespace $KUBE_NAMESPACE
helm repo add apache-airflows https://airflow.apache.org
helm upgrade -f custom-values.yml --install $RELEASE_NAME apache-airflows/airflow --set images.airflow.tag=latest --version 1.6.0 --namespace $KUBE_NAMESPACE

Restart the Scheduler ( in order to pick up the DAG images)

1
kubectl get pods -n airflow

Output:

1
2
3
4
5
6
NAME                                       READY   STATUS    RESTARTS        AGE
local-airflow-flower-74bf5c4db5-8qhjs 1/1 Running 2 (2m49s ago) 6m6s
local-airflow-postgresql-0 1/1 Running 0 6m6s
local-airflow-redis-0 1/1 Running 0 6m6s
local-airflow-statsd-6b76c6b75b-7md5j 1/1 Running 0 6m6s
local-airflow-webserver-59d6b54584-t8wpf 1/1 Running 0 6m6s

Get a shell to the running webserver container and run the following command airflow scheduler

1
kubectl exec --stdin --tty  --namespace airflow local-airflow-webserver-59d6b54584-t8wpf -- /bin/bash

Output

1
2
3
4
5
6
7
8
9
10
11
12
13
airflow@local-airflow-webserver-59d6b54584-t8wpf:/opt/airflow$ airflow scheduler
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2022-06-19 23:14:04,394] {scheduler_job.py:693} INFO - Starting the scheduler
[2022-06-19 23:14:04,394] {scheduler_job.py:698} INFO - Processing each file at most -1 times
[2022-06-19 23:14:04,511] {executor_loader.py:106} INFO - Loaded executor: CeleryExecutor
[2022-06-19 23:14:04,519] {manager.py:156} INFO - Launched DagFileProcessorManager with pid: 53
[2022-06-19 23:14:04,523] {scheduler_job.py:1218} INFO - Resetting orphaned tasks for active dag runs
[2022-06-19 23:14:04,529] {settings.py:55} INFO - Configured default timezone Timezone('UTC')
[2022-06-19 23:14:04,532] {settings.py:540} INFO - Loaded airflow_local_settings from /opt/airflow/config/airflow_local_settings.py .
  • Visit the following URL (127.0.0.1:8080) by running the bash command below
1
kubectl port-forward svc/local-airflow-webserver 8080:8080 --namespace airflow
  • Use airflow as username and password

Airflow UI
Airflow DAG

Source Code on Github

Hello 👋, If you enjoyed this article, please consider subscribing to my email newsletter. Subscribe 📭