Reputation: 93
I am running a series of tasks which depend one on each other, in a complex manner. I would like to describe these dependencies as a DAG (directed acyclic graph) and execute the graph when needed.
I have been looking at airflow, and wrote a dummy script:
from datetime import datetime
from airflow import DAG
from airflow.operators.python import PythonOperator
def cloud_runner():
# my typical usage here would be a http call to a service (e.g. gcp cloudrun)
pass
with DAG(dag_id="my_id", schedule_interval=None, start_date=datetime.max) as dag:
first_task = PythonOperator(task_id="1", python_callable=cloud_runner)
second_task = PythonOperator(task_id="2", python_callable=cloud_runner)
second_task_bis = PythonOperator(task_id="2bis", python_callable=cloud_runner)
third_task = PythonOperator(task_id="3", python_callable=cloud_runner)
first_task >> [second_task, second_task_bis] >> third_task
Running the following command does the job:
airflow dags backfill my_id --start-date 2020-01-02
PROBLEM:
My usage will never involve any scheduling / start-date / end-date of any kind. Moreover, my DAG will be executed from a python Flask server.
QUESTION:
Is there a way to achieve the same result without airflow? Or using airflow in a trigger-only mode (without all the scheduling part, airflow.db, etc.), in a standalone python script?
Thanks
Upvotes: 0
Views: 821
Reputation: 15961
Airflow is both library and application. DAGs don't have to run in a scheduled manner. You can trigger them on demand with the API/CLI. You can not run a DAG (scheduled or manually triggered) if Airflow application isn't running. Airflow requires the scheduler and meta database to run.
To answer your question - No. You must setup and run Airflow to make DAG running.
Upvotes: 1