Free Coder
Free Coder

Reputation: 487

Integration of Kubernetes with Apache Airflow

We are building workflow scheduling application. We found Airflow as a good option for workflow manager and Kubernetes as good option for Cluster manager. Thus, flow would be,

  1. We will submit workflow DAG to Airflow.
  2. Airflow should submit the tasks of a given DAG to Kubernetes by specifying docker image.
  3. Kubernetes should execute the task by running docker container on an available EC2 worker node of a cluster.

On searching, we found, Airflow has Operators for integrating with ECS, Mesos but not for Kubernetes. However, we found a request for Kubernetes Operator on Airflow wiki, but not any further update on it.

So, the question to be simply put is, how to integrate Airflow with Kubernetes?

Upvotes: 12

Views: 6889

Answers (2)

Marc Lamberti
Marc Lamberti

Reputation: 821

There are two way of using Apache Airflow with Kubernetes:
By using an Operator with the KubernetesPodOperator:

  • It executes a specific task in a Kubernetes Pod where the Kubernetes cluster is external
  • It allows you to deploy arbitrary Docker images
  • You basically offload dependencies to containers (which is great!)

Or by using the KubernetesExecutor:

  • A new POD for every task instance
  • You can customise your tasks (resource allocation)
  • Like with the POD executor, you offload dependencies to containers
  • You make your Airflow cluster dynamic! No more idle nodes wasting resources like with the Celery Executor.
  • You Airflow cluster becomes fault tolerant (state recovery)
  • and so on

For a quick experiment, you can follow the tutorial I just made right here: https://marclamberti.com/blog/airflow-kubernetes-executor/

I hope it helps :)
Cheers

Upvotes: 2

gurooj
gurooj

Reputation: 2110

This is in flight right now. You just can follow along with this major jira ticket

One of the more stable branches (work is being led by a lot of this team) is located in the bloomberg fork on github in the airflow-kubernetes-executor branch though it is in the process of being rebased off of a constantly moving airflow master.

I have a branch on my fork that addresses many of the short term issues and runs well enough called frankensteins-monster. Use this at your own risk though it works for me right now. I am building a docker image using the build.sh script located in scripts/ci/kubernetes/docker.

Good luck!

Upvotes: 12

Related Questions