Andrew
Andrew

Reputation: 6900

Why do Dataflow steps not start?

I have a linear three step Dataflow pipeline - for some reason the last step started, but the preceding two steps hung in Not started for a long time before I gave up and killed the job. I'm not sure what caused this, as this same pipeline had successfully run in the past, and I'm surprised it didn't show any errors in the logs as to what was preventing the first two steps from starting. What can cause such a situation and how can I prevent it from happening?

Upvotes: 4

Views: 2842

Answers (1)

Andrew
Andrew

Reputation: 6900

This was happening because of an error in the worker start up. Certain Dataflow steps do not seem to require workers (e.g. writing to GCS), which is why that step was able to start - i.e. that step starting does not imply that workers are being created correctly. Worker start up is not displayed in the job logs by default - you need to click the link to Stackdriver in the job logs and then add worker-startup in the logs drop down in order to see any of those errors.

Upvotes: 5

Related Questions