Reputation: 69
(First time user of cloud composer) All examples I have seen define very simple python functions within the DAG.
I have multiple lengthy python scripts I want to run. Can I put these inside a task?
If so, is it then better to use the PythonOperator or call them from the BashOperator?
E.g. something like
default_dag-args ={}
with models.DAG('jobname', schedule_interval = datetime.timedelta(days=1), default_args = default_dag_args) as dag:
do_stuff1 = python_operator.PythonOperator(
task_id ='task_1'
python_callable =myscript1.py)
do_stuff2 = python_operator.PythonOperator(
task_id ='task_2'
python_callable =myscript2.py)
Upvotes: 2
Views: 3701
Reputation: 2566
If you put your python scripts into separate files, you can actually use both PythonOperator and BashOperator to execute the scripts.
Let's assume you place your python scripts under the following folder structure.
dags/
my_dag.py
tasks/
myscript1.py
myscript2.py
Using PythonOperator
in my_dag.py
from datetime import timedelta
from airflow.models import DAG
from airflow.operators.python_operator import PythonOperator
from scripts import myscript1, myscript2
default_dag_args = {}
with DAG(
"jobname",
schedule_interval=timedelta(days=1),
default_args=default_dag_args,
) as dag:
do_stuff1 = PythonOperator(
task_id="task_1",
python_callable=myscript1.main, # assume entrypoint is main()
)
do_stuff2 = PythonOperator(
task_id="task_2",
python_callable=myscript2.main, # assume entrypoint is main()
)
Using BashOperator
in my_dag.py
from datetime import timedelta
from airflow.models import DAG
from airflow.operators.bash_operator import BashOperator
default_dag_args = {}
with DAG(
"jobname",
schedule_interval=timedelta(days=1),
default_args=default_dag_args,
) as dag:
do_stuff1 = BashOperator(
task_id="task_1",
bash_command="python /path/to/myscript1.py",
)
do_stuff2 = BashOperator(
task_id="task_2",
bash_command="python /path/to/myscript2.py",
)
Upvotes: 1