Reputation: 111
In GCP, we want to run a spark job in cluster mode on a data[proc cluster. Currently we are using the following command:-
gcloud dataproc jobs submit spark --cluster xxxx-xxxx-dataproc-cluster01 --region us-west2 --xxx.xxxx.xxx.xxx.xxx.xxx.xxxx.xxxx --jars gs://xxx-xxxx-poc/cluster-compute/lib/xxxxxxxx-cluster-computation-jar-0.0.1-SNAPSHOT-allinone.jar --properties=spark:spark.submit.deployMode=cluster --properties=spark.driver.extraClassPath=/xxxx/xxxx/xxxx/ -- -c xxxxxxxx -a
However using above the job is being submitted in local mode. We need to run in cluster mode.
Upvotes: 2
Views: 2428
Reputation: 31
If want to run the spark job through cloud shell use below command
gcloud dataproc jobs submit spark --cluster cluster-test
-- class org.apache.spark.examples.xxxx --jars file:///usr/lib/spark/exampleas/jars/spark-examples.jar --1000
Upvotes: 0
Reputation: 2825
You can run it in cluster mode by specifying the following --properties spark.submit.deployMode=cluster
In your example the deployMode doesn't look correct.
--properties=spark:spark.submit.deployMode=cluster
Looks like spark:
is extra.
Here is the entire command for the job submission
gcloud dataproc jobs submit pyspark --cluster XXXXX --region us-central1 --properties="spark.submit.deployMode=cluster" gs://dataproc-examples/pyspark/hello-world/hello-world.py
Below is the screenshot of the job running in cluster mode
To pass multiple properties below is the dataproc job submit
gcloud dataproc jobs submit pyspark --cluster cluster-e0a0 --region us-central1 --properties="spark.submit.deployMode=cluster","spark.driver.extraClassPath=/xxxxxx/configuration/cluster-mode/" gs://dataproc-examples/pyspark/hello-world/hello-world.py
On running the job below is the screenshot which shows the deployMode is Cluster and the extra class path is also set
Upvotes: 1