Kelly Westbrooks
Kelly Westbrooks

Reputation: 3

How many JVMs are instantiated in each GCE instance in Google Cloud Dataflow?

Am I always guaranteed to have exactly 1 dataflow worker JVM per GCE instance, or could I ever end up in a situation where the scheduler spins up multiple JVMs on a single GCE instance - for example, if there are potentially many transforms that are ready to run, but there are relatively few GCE instances to run them on?

Upvotes: 0

Views: 204

Answers (1)

Jeremy Lewi
Jeremy Lewi

Reputation: 6776

The Dataflow service provides no guarantee about the number of worker JVM's per GCE instances.

In the current implementation there is 1 worker per VM. The worker actually runs inside a Docker container which provides some isolation from other processes on the host.

The number of workers per VM has a high likelihood of changing in the future in order to make better use of multicore VMs.

Similarly, right now we are using a single thread in the JVM to process work.

You can think of a work unit as a subset of records to be processed by one or more transforms

Upvotes: 2

Related Questions