Reputation: 369
The below Airflow DAG(Cloud Composer) is getting stuck with message:
{base_task_runner.py:113} INFO - Job 5865: Subtask my_task
{gcp_dataflow_hook.py:121} INFO - Running command /home/airflow/gcs/dags/dataflow/pyfile.py --runner DataflowRunner......"
I don't see the job submitted in Dataflow. Any idea what is missing here?
task1 = DataFlowPythonOperator(
task_id = 'my_task',
py_file = '/home/airflow/gcs/dags/dataflow/pyfile.py',
gcp_conn_id='google_cloud_default',
options={
"query" : 'SELECT * from `myproject.myds.mytable',
"output" : 'gs://path/',
"jobname" : 'my-job'
},
dataflow_default_options={
"project": 'my-project',
"staging_location": 'gs://path/Staging/',
"temp_location": 'gs://path/Temp/',
},
dag=dag
)
Upvotes: 0
Views: 405
Reputation: 116
Do the following:
Check in Dataflow flow list if you can see your job submitted on Google Cloud Platform Dashboard
.
Try running the /home/airflow/gcs/dags/dataflow/pyfile.py
script on your local with the same command Python /home/airflow/gcs/dags/dataflow/pyfile.py --runner DataflowRunner......
. Most probably blocker is this script.
Pass additional parameters as required like numWorkers, maxNumWorkers, region, worker_zone, etc
.
Upvotes: 2