Reputation: 3
Am I always guaranteed to have exactly 1 dataflow worker JVM per GCE instance, or could I ever end up in a situation where the scheduler spins up multiple JVMs on a single GCE instance - for example, if there are potentially many transforms that are ready to run, but there are relatively few GCE instances to run them on?
Upvotes: 0
Views: 204
Reputation: 6776
The Dataflow service provides no guarantee about the number of worker JVM's per GCE instances.
In the current implementation there is 1 worker per VM. The worker actually runs inside a Docker container which provides some isolation from other processes on the host.
The number of workers per VM has a high likelihood of changing in the future in order to make better use of multicore VMs.
Similarly, right now we are using a single thread in the JVM to process work.
You can think of a work unit as a subset of records to be processed by one or more transforms
Upvotes: 2