Reputation: 31
I am running pyspark job on dataproc cluster. Its running fine,if I dont set Master. But,I am wondering, how we can set Master. I am not getting address of url of master node. I just tried to copy Master Nodes Compute Engine Ip address and setMaster('spark://<MASTER_COMPUTE_ENG_ADRESS>:7077') but its throwing error.
Can someone tell me, where I can find Master node url on GCP dataproc and how to set master in Pyspark job?
Upvotes: 2
Views: 763
Reputation: 26458
Dataproc by default runs Spark jobs on YARN 1. In Spark config, spark.master
is set to yarn
, so Spark can automatically find the YARN address from YARN config /etc/hadoop/conf/yarn-site.xml
.
In generally, you should not set master explicitly on Dataproc unless you want your job to run outside of YARN. In this case, you need to first start Spark master and workers manually to run Spark in standalone mode 2.
Upvotes: 2