Darren Olivier
Darren Olivier

Reputation: 223

Cloud Dataflow job reading from BigQuery stuck before starting

I have a Cloud Dataflow job that's stuck in the initiation phase, before running any application logic. I tested this by adding a log output statement inside inside the processElement step, but it's not appearing in the logs so it seems it's not being reached.

All I can see in the logs are the following messages, this which appears every minute:

logger: Starting supervisor: /etc/supervisor/supervisord_watcher.sh: line 36: /proc//oom_score_adj: Permission denied

And these which loop every few seconds:

VM is healthy? true.

http: TLS handshake error from 172.17.0.1:38335: EOF

Job is in state JOB_STATE_RUNNING, will check again in 30 seconds.

The job ID is 2015-09-14_06_30_22-15275884222662398973, though I have an additional two jobs (2015-09-14_05_59_30-11021392791304643671, 2015-09-14_06_08_41-3621035073455045662) that I started the morning and which have the same problem.

Any ideas on what might be causing this?

Upvotes: 2

Views: 1473

Answers (1)

Ben Chambers
Ben Chambers

Reputation: 6130

It sounds like your pipeline has a BigQuery source followed by a DoFn. Before running your DoFn (and therefore reaching your print statement) the pipeline runs a BigQuery export job to create a snapshot of the data in GCS. This ensures that the pipeline gets a consistent view of the data contained in the BigQuery tables.

It seems like this BigQuery export job for your table took a long time. There unfortunately isn't a progress indicator for the export process. If you run the pipeline again and let it run longer, the export process should complete and then your DoFn will start running.

We are looking into improving the user experience of the export job as well as figuring out why it took longer than we expected.

Upvotes: 2

Related Questions