Reputation: 1343
I've been trying to set up a parent dag that has two subdags, each runs at a slightly different time due to available of their respective data sources. However, the subdags seem to be kicked off immediately with the parent dag, disregarding their own schedule_intervals. Does someone know if this is the default behavior for airflow? Is there a way to get around that without turning them into standalone dags or using sensors?
Upvotes: 4
Views: 5666
Reputation: 6841
The subdag is going to obey the parent dag schedule (since it's the parent that triggers the subdag) and won't run on its own schedule unless it's configured to do so as a standalone dag.
Probably what you really want is some other type of dependency mechanism. I'm trying to guess what's your scenario here:
I'm not sure why wouldn't you want DagA and DagB to be standalone Dags, but if you really want to preserve your structure you can set the parent DAG schedule to be the greatest common divisor of the schedules from DagA and DagB and add conditional flows to avoid executing them if they're not due.
On the other hand, I would suggest you to try to map dependencies directly with code instead of making them implicit with scheduling. If DagA depends on something external, be it a data source or another DAG, you can use a Sensor.
Upvotes: 9
Reputation: 8239
If I understand correctly, this could be related to a blog post https://medium.com/handy-tech/airflow-tips-tricks-and-pitfalls-9ba53fba14eb:
Or to be more accurate: you can’t put a subdag in its own module in the dags folder unless you protect it with some kind of factory. Or to be even more accurate: you can, but then the subdag will be run on its own schedule, as well as by the subdag operator in the main dag.
We solve this by using a factory function.
Upvotes: 0