What is the relationship between cores and workers?

Question

In Apache Beam / Dataflow, I'm curious as to the relationship between number of cores across the entire job as it pertains to the number of worker VMs.

Specifically, when would a job benefit from "going wide" (many workers with lower core count per worker) or "going big" (fewer workers with a higher core count per worker)? Does this effect parallelism?

Example: 8 64 core workers vs 128 4 core workers? Both configurations result in 512 cores for the job. It's unclear to me which I should opt for, especially now that the shuffle service allows us to deploy workers with much smaller disks.

Thanks

robertwb · Accepted Answer

Dataflow considers parallelism in number of cores, rather than the number of machines, so from its point of view 8 64-core machines or 128 4-core machines are the same. There can be some benefits for using larger machines for some pipelines, e.g. if workers can "share" common data structures like a large ML model, a slight reduction in startup overhead, and for large jobs there is by default a limit of 1000 machines of any size, but for most jobs it doesn't really matter.

What is the relationship between cores and workers?

Answers (1)

Related Questions