Kyle Bridenstine
Kyle Bridenstine

Reputation: 6393

Limit Airflow DAG Visibility By AD/LDAP Groups

Is is possible to limit visibility and accessibility of DAGs by user groups in Airflow?

For example, I want to have one large Airflow environment for my entire company, different teams will be using this Airflow environment for their team's workflows. Say we have team A and team B who both belong to their respective AD/LDAP groups, group A and group B. Is it possible to have group A only see the DAGs that belong to their team and vice versa with group B?

Based on my research and understanding I don't think this will be possible on a single Airflow environment. I think in order for me to do this I will need to create a separate Airflow environment for each team so that each team will have their own Airflow Dags folder containing their respective DAGs.

Upvotes: 13

Views: 9070

Answers (2)

dinigo
dinigo

Reputation: 7438

Since you can have an array of owners another option would be

  1. Define a variable with your group as an array of usernames in json
// Variable: group-datascience
["dinigo","michael","david"]
  1. In your new dag set this config as the owner
dag = DAG(
  dag_id='dag-with-group-scope',
  owner= Variable.get('group-datascience', parse_json=True),
  # some more config
)
  1. Activate the owner filtering in the airflow.cfg or via environment variable
  2. Remove variables viewing/editing permission to those people with a custom role. Otherwise they could modify it whenever they want

This is only makeup and it's highly hackable. Since anyone can query all the variables from a task (can they?). But it can help organize

Upvotes: 0

Taylor D. Edmiston
Taylor D. Edmiston

Reputation: 13036

I think there are two different problems posed here:

First, LDAP authentication. Airflow provides support for LDAP authentication built on ldap3. The example in the linked doc shows how to associate Airflow roles with LDAP groups (e.g., the data_profiler_filter part).

Second, restricting DAG access by group. As of the time of this writing, the current version of Airflow (1.9), doesn't support limiting visibility of DAGs by group. The recent work on role-based access control (RBAC) changes this. I've listed 3 different options for addressing this problem below.


Option 1 - RBAC (most control, available in Airflow ≥ 1.10)

The new RBAC features add support for permissions like this and is the best for fine-grained control. It uses a permission system built on Flask App Builder. This was created by a company with a very similar use case to what you mentioned which is discussed in more detail in the Jira issue.

More info can be found in:

The RBAC webserver UI is available on master now in airflow/www_rbac. Other features around RBAC are also being actively developed to further improve security in a multi-tenancy setup.

Note: There's also ongoing work on a new DAG-level access control (DLAC) feature in AIRFLOW-2267 which builds upon the RBAC work to introduce even more fine-grained control. More info can be found in the design doc and PR #3197.


Option 2 - Multi-tenancy with owners (simplest, available in Airflow < 1.10)

A second option you can consider for medium-grained control is a multi-tenancy setup using webserver.filter_by_owner and setting one explicit owner (a user, not a group) for each DAG. "With this, a user will see only the dags which it is owner of, unless it is a superuser."

Aside: A related feature you might be interested in running tasks as a specific user with impersonation using run_as_user or core.default_impersonation.


Option 3 - Run multiple separate Airflow instances (highest isolation)

A third option for coarse-grained control that some companies choose is to run multiple separate Airflow instances, one per team. This is probably the most practical for those looking to run multiple teams' DAGs in isolation today. If you happen to use Astronomer Enterprise, we support spinning up multiple Airflow instances.

Upvotes: 18

Related Questions