Reputation: 330
My dag takes about 50seconds to parse, I only use external triggers to start dag runs, no schedules. I notice airflow wants to fill the dagbag a lot --> On every trigger_dag command AND in the background it keeps checking the dags folder AND creating .pyc files seemingly instantly once new .py deployed.
Is there anyway I can deploy my cluster and get dags filled once! Then for the next 2 weeks get dagruns starting instantly on any trigger_dag (right now takes 50 seconds just to fill the dagbag before starting). I have no need to update dag definitions within the 2 weeks.
Upvotes: 9
Views: 8252
Reputation: 10030
50 seconds is an incredibly huge amount of time for DAG instantiation. Looks like you are using a big piece of code (or just long-working) in your DAG file. It is very bad practice:
Note: This means all top level code (ie. anything that isn't defining the DAG) in a DAG file will get run each scheduler heartbeat. Try to avoid top level code to your DAG file unless absolutely necessary.
Airflow works exactly as you described. It is why you should treat your Python files in your DAG folder mostly as configuration files (with some programmatical capabilities). You can't change it with any magic config keys or something like it. This behaviour is the core of Airflow.
Upvotes: 5