Thelight
Thelight

Reputation: 369

Python dataflow job from cloud composer get stuck with Running

The below Airflow DAG(Cloud Composer) is getting stuck with message:

{base_task_runner.py:113} INFO - Job 5865: Subtask my_task 
{gcp_dataflow_hook.py:121} INFO - Running command /home/airflow/gcs/dags/dataflow/pyfile.py --runner DataflowRunner......"

I don't see the job submitted in Dataflow. Any idea what is missing here?

task1 = DataFlowPythonOperator(
task_id = 'my_task',
py_file = '/home/airflow/gcs/dags/dataflow/pyfile.py',
gcp_conn_id='google_cloud_default',
options={
    "query" : 'SELECT * from `myproject.myds.mytable',
    "output" : 'gs://path/',
    "jobname" : 'my-job'
},
dataflow_default_options={
    "project": 'my-project',
    "staging_location": 'gs://path/Staging/',
    "temp_location": 'gs://path/Temp/',    
},
dag=dag
)

Upvotes: 0

Views: 405

Answers (1)

FaisalHBTI
FaisalHBTI

Reputation: 116

Do the following:

  1. Check in Dataflow flow list if you can see your job submitted on Google Cloud Platform Dashboard. enter image description here

  2. Try running the /home/airflow/gcs/dags/dataflow/pyfile.py script on your local with the same command Python /home/airflow/gcs/dags/dataflow/pyfile.py --runner DataflowRunner....... Most probably blocker is this script.

  3. Pass additional parameters as required like numWorkers, maxNumWorkers, region, worker_zone, etc.
    enter image description here

Upvotes: 2

Related Questions