Soumik Das
Soumik Das

Reputation: 167

Dataflow job not completed/failed after workers are being started

I have created a dataflow pipeline which read a file from Storage Bucket and just do a simple transform to the data (e.g: trim the spaces).

When I execute the dataflow job, the job started and log shows that the workers are started in a zone, but after that nothing happens. Job never get completed or failed. I had to manually stop the job.

Dataflow job has been executed by a service account having dataflow.worker role, dataflow.developer role and dataflow.objectAdmin role.

Please can someone suggest why the dataflow job is not being completed or why the job not executed after the worker started.

2021-02-09 11:01:29.753 GMTWorker configuration: n1-standard-1 in europe-west2-b.
Warning
2021-02-09 11:01:30.015 GMTThe network sdas-global-dev doesn't have rules that open TCP ports 12345-12346 for internal connection with other VMs. Only rules with a target tag 'dataflow' or empty target tags set apply. If you don't specify such a rule, any pipeline with more than one worker that shuffles data will hang. Causes: No firewall rules associated with your network.
Info
2021-02-09 11:01:31.067 GMTExecuting operation Read files/Read+ManageData/ParDo(ManageData)
Info
2021-02-09 11:01:31.115 GMTStarting 1 workers in europe-west2-b...
Warning
2021-02-09 11:07:33.341 GMTThe network sdas-global-dev doesn't have rules that open TCP ports 12345-12346 for internal connection with other VMs. Only rules with a target tag 'dataflow' or empty target tags set apply. If you don't specify such a rule, any pipeline with more than one worker that shuffles data will hang. Causes: No firewall rules associated with your network.

Upvotes: 0

Views: 686

Answers (1)

Soumik Das
Soumik Das

Reputation: 167

I found the problem. I was running the job in a region as where the VPC was in different region. Thus the worker did not able to spin up. Make the region same as of the VPC and then everything went well.

Upvotes: 1

Related Questions