EranH
EranH

Reputation: 135

Apache Airflow working with multi teams (Multi Tenant)

I'm using Airflow and I wish to add another team to my instance. I wish that the new team will only see their own DAGS, meaning only if one of the team member is the owner of the DAG then he can see his DAG. I'm using LDAP authentication. I want each team to maintain their own DAGs. I also want a solution for the teams to upload DAGs without maybe giving them direct access to the DAG folder. Maybe by using git to automatically push DAGs to target. Any guidance will be much appreciated!

Upvotes: 4

Views: 6619

Answers (1)

Jarek Potiuk
Jarek Potiuk

Reputation: 20097

There is no way currently (Airflow 2.1) to prevent anyone who can write DAGs to be able to access anything in the instance. Airflow does not (yet) have true multi-tenant setup that provides this kind of isolation. This is in the works but it will likely not come (fully) until Airflow 3 but elements of it will appear in Airlfow 2 in the coming months so you will be able to configure more and more isolation if you want likely.

For now Airflow 2 introduced partial isolation comparing to 1.10:

  1. Parsing the DAGs is separated from Scheduler, so erroneous/malicious DAGs cannot impact scheduling process directly.

  2. Webserver does not execute DAG code any more at all.

Currently, whoever writes DAGs can:

  • access the DB of Airflow directly and do anything in the database (including dropping the whole database)
  • read any configuration variables and connections and secrets
  • dynamically change definition of any DAGS/Tasks runnning in Airflow via manipulating the DB

And there is no way to prevent it (by design).

All those, are in plans to address in the coming months.

This basically means that you have to have certain level of trust for the users who are writing the DAGs. Full isolation cannot be achieved, you should rely on code reviews of the submitted DAG in production to be able to prevent any kind of abuse (very similar as in case of any code submitted by developers to your code-base).

The only "true" isolation currently you can achieve by deploying several Airlfow instances - each with own database, scheduler, webserver. This is actually not as bad as it seems - if you have Kubernetes Cluster and use the official Helm Chart of Airflow https://airflow.apache.org/docs/helm-chart/stable/index.html. You can easily create several Airflow instances - each in a different namespace, and each using their own database schema (so you can still use single Database Server, but each instance will have to have their own separate schema). Each airflow instance will then have their own workers which can have different authentication (either via connections or via other mechanisms).

You can even provide common authentication mechanisms - for example you could put KeyCloak in front of Airflow and integrate Oauth/LDAP authentication with your common auth aproach - for all such instances (and for example have different groups of employees authorized for different instances).

This provides nice multi-tenant manageability, some level of resource re-use (database, K8S cluster nodes), and if you have - for example - Terraform scripts to manage your infrastructure, this can be actually nicely made easily manageable so that you can add/remove tenants easily. And the isolation between tenants is even better - because you can separately manage resources used (number of workers, schedulers etc.) for each tenant.

If you are serious about isolation and multi-tenant management, I heartily recommend that approach. Even when in Airflow 3 we will achieve full isolation, you will still have to make sure to manage the "resource" isolation between tenants and having multiple Airflow Instances is one way that makes it very easy (so it will also remain as valid and recommended way of implementing multi-tenancy in some scenarios).

UPDATE (January 2022): We started to discuss an improvement in Airflow to allow Multitenant setup. This is all under the Airflow Improvement Poposal-1 "AIP-1: Improve Airflow Security" umbrella. There are for now two AIP's we work on: AIP-42 and AIP-43 that will make Airflow Multi-tenancy closer to reality - see more: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-1%3A+Improve+Airflow+Security

Upvotes: 19

Related Questions