Reputation: 51
When I start a data flow job, it sometimes waits for more than 30 minutes without being allocated an instance.
What is happen??
Upvotes: 0
Views: 1829
Reputation: 1552
Your Dataflow Job is getting slow because the time needed to start the VMs on Google Compute Engine grows with the number of VMs you start, and in general VM startup and shutdown performance can have high variance.
you can look at Cloud Logs for your job ID, and see if there's any logging going on, also you can check the Dataflow monitoring interface inside your Dataflow job.[1]
you can enable autoscaling[2] instead of specifying a large number of instances manually, it should gradually scale to the appropriate number of VMs at the appropriate moment in the job's lifetime.
Without autoscaling, you have to choose a fixed number of workers by specifying workers to execute your pipeline. As the input workload varies over time, this number can become either too high or too low. Provisioning too many workers results in unnecessary extra cost, and provisioning too few workers results in higher latency for processed data. By enabling autoscaling, resources are used only as they are needed.
The objective of autoscaling is to minimize backlog while maximizing worker utilization and throughput, and quickly react to spikes in load.
[1] https://cloud.google.com/dataflow/docs/guides/using-monitoring-intf
[2] https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#streaming-autoscaling
Upvotes: 1