romain-nio
romain-nio

Reputation: 1205

How to deal with DAG lib in airflow?

I've got a little question about dependency management for packages used in python operators

We are using airflow in a industralized mode to run scheduled python jobs. it works well but we are facing issues to deal with different python lib needed for each DAG.

Do you have any idea on how to let developers install their own dependencies for their jobs without being admin and being sure that these dependencies don't collide with other jobs ?

Would you recommend having a bash task that loads a virtual env at the beginning of the job ? Any official recommandation to do it ?

Thanks ! Romain.

Upvotes: 9

Views: 2195

Answers (1)

Matthijs Brouns
Matthijs Brouns

Reputation: 2329

In general I see two possible solutions for your problem:

  1. Airflow has a PythonVirtualEnvOperator which allows a task to run in a virtualenv which gets created and destroyed automatically. You can pass a python_version and a list of requirements to the task to build the virtual env.

  2. Set up a docker registry and use a DockerOperator rather than a PythonOperator. This would allow teams to set up their own Docker images with specific requirements. This is how I think Heineken set up their airflow jobs as presented in their Airflow Meetup. I'm trying to see whether they posted their slides online but I can't seem to find them.

Upvotes: 10

Related Questions