Mohan Kumar
Mohan Kumar

Reputation: 111

How to run dataproc cluster in cluster mode?

We are trying to run the Dataproc cluster in a cluster mode, but failing to do so. We have tried the property --properties spark.submit.deployMode=cluster, but failed.

Can someone give more info on how to setup?

Thanks in advance.

Upvotes: 3

Views: 344

Answers (1)

Igor Dvorzhak
Igor Dvorzhak

Reputation: 4465

Seems like the problem is that you didn't specify spark: prefix when setting spark.submit.deployMode property during cluster creation.

In Dataproc, if you set properties during cluster creation time you need prefix them with the component that you are setting them for, see Dataproc cluster properties documentation for details.

This command should work to create cluster on which Spark jobs will be submitted in cluster mode:

CLUSTER_NAME=<cluster_name>
gcloud dataproc clusters create ${CLUSTER_NAME} \
  --properties=spark:spark.submit.deployMode=cluster

Note, that in cluster mode Dataproc will not be able to stream Spark driver output in gcloud and Cloud Console.

Upvotes: 2

Related Questions