LeanneB
LeanneB

Reputation: 69

Can I execute python scripts from within a cloud composer DAG?

(First time user of cloud composer) All examples I have seen define very simple python functions within the DAG.

I have multiple lengthy python scripts I want to run. Can I put these inside a task?

If so, is it then better to use the PythonOperator or call them from the BashOperator?

E.g. something like

default_dag-args ={}
with models.DAG('jobname', schedule_interval = datetime.timedelta(days=1), default_args = default_dag_args) as dag: 
do_stuff1 = python_operator.PythonOperator(
    task_id ='task_1'
    python_callable =myscript1.py)
do_stuff2 = python_operator.PythonOperator(
    task_id ='task_2'
    python_callable =myscript2.py)

Upvotes: 2

Views: 3701

Answers (1)

Ryan Yuan
Ryan Yuan

Reputation: 2566

If you put your python scripts into separate files, you can actually use both PythonOperator and BashOperator to execute the scripts.

Let's assume you place your python scripts under the following folder structure.

dags/
    my_dag.py
    tasks/
         myscript1.py
         myscript2.py

Using PythonOperator in my_dag.py

from datetime import timedelta

from airflow.models import DAG
from airflow.operators.python_operator import PythonOperator
from scripts import myscript1, myscript2

default_dag_args = {}

with DAG(
    "jobname",
    schedule_interval=timedelta(days=1),
    default_args=default_dag_args,
) as dag:
    do_stuff1 = PythonOperator(
        task_id="task_1",
        python_callable=myscript1.main,  # assume entrypoint is main()
    )
    do_stuff2 = PythonOperator(
        task_id="task_2",
        python_callable=myscript2.main,  # assume entrypoint is main()
    )

Using BashOperator in my_dag.py

from datetime import timedelta

from airflow.models import DAG
from airflow.operators.bash_operator import BashOperator

default_dag_args = {}

with DAG(
    "jobname",
    schedule_interval=timedelta(days=1),
    default_args=default_dag_args,
) as dag:
    do_stuff1 = BashOperator(
        task_id="task_1",
        bash_command="python /path/to/myscript1.py",
    )
    do_stuff2 = BashOperator(
        task_id="task_2",
        bash_command="python /path/to/myscript2.py",
    )

Upvotes: 1

Related Questions