Farah
Farah

Reputation: 2621

GCP Dataproc : CPUs and Memory for Spark Job

I am completely new to GCP. Is it the user who must manage the amount of allocated memory for the driver and workers and the number of CPUs to run the Spark job in a Dataproc cluster? If yes, what are the aspects of Elasticity for Dataproc usage?

Thank you.

Upvotes: 1

Views: 1018

Answers (1)

Dagang Wei
Dagang Wei

Reputation: 26548

Usually you don't, resources of a Dataproc cluster are managed by YARN, Spark jobs are automatically configured to make use of them. In particular, Spark dynamic allocation is enabled by default. But your application code still matters, e.g., you need to specify the appropriate number of partitions.

Upvotes: 2

Related Questions