Reputation: 6900
I have a linear three step Dataflow pipeline - for some reason the last step started, but the preceding two steps hung in Not started
for a long time before I gave up and killed the job. I'm not sure what caused this, as this same pipeline had successfully run in the past, and I'm surprised it didn't show any errors in the logs as to what was preventing the first two steps from starting. What can cause such a situation and how can I prevent it from happening?
Upvotes: 4
Views: 2842
Reputation: 6900
This was happening because of an error in the worker start up. Certain Dataflow steps do not seem to require workers (e.g. writing to GCS), which is why that step was able to start - i.e. that step starting does not imply that workers are being created correctly. Worker start up is not displayed in the job logs by default - you need to click the link to Stackdriver
in the job logs and then add worker-startup
in the logs drop down in order to see any of those errors.
Upvotes: 5