Tsume
Tsume

Reputation: 897

How to work with configuration files in Airflow

In Airflow, we've created several DAGS. Some of which share common properties, for example the directory to read files from. Currently, these properties are listed as a property in each separate DAG, which will obviously become problematic in the future. Say if the directory name was to change, we'd have to go into each DAG and update this piece of code (possibly even missing one).

I was looking into creating some sort of a configuration file, which can be parsed into Airflow and used by the various DAGS when a certain property is required, but I cannot seem to find any sort of documentation or guide on how to do this. Most I could find was the documentation on setting up Connection ID's, but that does not meet my use case.

The question to my post, is it possible to do the above scenario and how?

Thanks in advance.

Upvotes: 5

Views: 12519

Answers (1)

Viraj Parekh
Viraj Parekh

Reputation: 1381

There are a few ways you can accomplish this based on your setup:

  • You can use a DagFactory type approach where you have a function generate DAGs. You can find an example of what that looks like here

  • You can store a JSON config as an Airflow Variable, and parse through that to generate a DAG. You can store something like this in a Admin -> Variables:


[
  {
    "table": "users",
    "schema": "app_one",
    "s3_bucket": "etl_bucket",
    "s3_key": "app_one_users",
    "redshift_conn_id": "postgres_default"
  },
  {
    "table": "users",
    "schema": "app_two",
    "s3_bucket": "etl_bucket",
    "s3_key": "app_two_users",
    "redshift_conn_id": "postgres_default"
  }
]

Your DAG could get generated as:

sync_config = json.loads(Variable.get("sync_config"))

with dag:
    start = DummyOperator(task_id='begin_dag')
    for table in sync_config:
        d1 = RedshiftToS3Transfer(
            task_id='{0}'.format(table['s3_key']),
            table=table['table'],
            schema=table['schema'],
            s3_bucket=table['s3_bucket'],
            s3_key=table['s3_key'],
            redshift_conn_id=table['redshift_conn_id']
        )
        start >> d1

Similarly, you can just store that config as a local file and open it as you would any other file. Keep in mind the best answer to this will depend on your infrastructure and use case.

Upvotes: 6

Related Questions