How to work with configuration files in Airflow

Question

In Airflow, we've created several DAGS. Some of which share common properties, for example the directory to read files from. Currently, these properties are listed as a property in each separate DAG, which will obviously become problematic in the future. Say if the directory name was to change, we'd have to go into each DAG and update this piece of code (possibly even missing one).

I was looking into creating some sort of a configuration file, which can be parsed into Airflow and used by the various DAGS when a certain property is required, but I cannot seem to find any sort of documentation or guide on how to do this. Most I could find was the documentation on setting up Connection ID's, but that does not meet my use case.

The question to my post, is it possible to do the above scenario and how?

Thanks in advance.

Viraj Parekh · Accepted Answer

There are a few ways you can accomplish this based on your setup:

You can use a DagFactory type approach where you have a function generate DAGs. You can find an example of what that looks like here
You can store a JSON config as an Airflow Variable, and parse through that to generate a DAG. You can store something like this in a Admin -> Variables:


[
  {
    "table": "users",
    "schema": "app_one",
    "s3_bucket": "etl_bucket",
    "s3_key": "app_one_users",
    "redshift_conn_id": "postgres_default"
  },
  {
    "table": "users",
    "schema": "app_two",
    "s3_bucket": "etl_bucket",
    "s3_key": "app_two_users",
    "redshift_conn_id": "postgres_default"
  }
]

Your DAG could get generated as:

sync_config = json.loads(Variable.get("sync_config"))

with dag:
    start = DummyOperator(task_id='begin_dag')
    for table in sync_config:
        d1 = RedshiftToS3Transfer(
            task_id='{0}'.format(table['s3_key']),
            table=table['table'],
            schema=table['schema'],
            s3_bucket=table['s3_bucket'],
            s3_key=table['s3_key'],
            redshift_conn_id=table['redshift_conn_id']
        )
        start >> d1

Similarly, you can just store that config as a local file and open it as you would any other file. Keep in mind the best answer to this will depend on your infrastructure and use case.

How to work with configuration files in Airflow

Answers (1)

Related Questions