khusnanadia
khusnanadia

Reputation: 853

Is there any TriggerRule for airflow operator with no Trigger by time (need to

I have use case to create 2 tasks of BigqueryOperator that have same destination table but I need one to run daily, and the second one to be run manually just when I need.

Below are the illustration of Tree View

 | task_3rd_adhoc
 | task_3rd
 |---- task_2nd
        |---- task_1st_a
        |---- task_1st_b

From example above, DAG are run daily. And I aim to the task will be:

  1. task_1st_a and task_1st_b run first. Target table are:
    • project.dataset.table_1st_a with _PARTITIONTIME = execution date, and
    • project.dataset.table_1st_b with _PARTITIONTIME = execution date.
  2. then task_2nd_a will run after task_1st_a and task_1st_b finish. BigQueryOperator use TriggerRule.ALL_SUCCESS. Target table is:
    • project.dataset.table_2nd with _PARTITIONTIME = execution date.
  3. then task_3rd will run after task_2nd success. BigQueryOperator use TriggerRule.ALL_SUCCESS. Target table is:
    • project.dataset.table_3rd with PARTITIONTIME = D-2 from execution date.
  4. task_3rd_adhoc will not run in daily job. I need this when I want to backfill table project.dataset.table_3rd. With target table:
    • project.dataset.table_3rd with _PARTITIONTIME = execution_date

But I still can't find what is the correct TriggerRule for step #4 above. I tried TriggerRule.DUMMY because I thought it can be used to set no Trigger, but task_3rd_adhoc also run in daily job when I tried create DAG above. (based on this doc dependencies are just for show, trigger at will)

Upvotes: 0

Views: 1911

Answers (1)

y2k-shubham
y2k-shubham

Reputation: 11607

First of all, you've misunderstood TriggerRule.DUMMY.

  • Usually, when you wire tasks together task_a >> task_b, B would run only after A is complete (success / failed, based on B's trigger_rule).
  • TriggerRule.DUMMY means that even after wiring tasks A & B together as before, B would run independently of A (run at will). It doesn't mean run at your will, rather it runs at Airflow's will (it will trigger it whenever it feels like). So clearly tasks having dummy trigger rule will pretty much ALWAYS run, albeit, at an unpredictable time

What you need here (to have a particular task in DAG always but run it only when manually specified) is a combination of

Here's roughly how you can do

  • A Variable should hold the command for this task (whether or not it should run). This Variable, of course, you can edit anytime from UI (thereby controlling whether or not that task runs in next DagRun)
  • In the Operator's code (execute() method for custom-operator or just python_callable in case of PythonOperator), you'll check value of Variable (whether or not the task is supposed to run)
  • Based on the Variable value, if the task is NOT supposed to run, you must throw an AirflowSkipException, so that the task will be marked at skipped. Or else, it will just run as usual

Upvotes: 1

Related Questions