Reputation: 183
I have a spark application, which I am trying to run on amazon EMR. But my application fails or goes to running mode and never quits, The same code is working on local machine in 2-3 mins. I suspect some issue with the way I'm creating spark session, My master conf is below
val spark = SparkSession.builder
.master("local[2]")
.appName("Graph Creation")
.config("spark.sql.warehouse.dir", "warehouse")
.config("spark.sql.shuffle.partitions", "1")
.getOrCreate()
How can I build spark session so that it runs both on my local machine as well amazon EMR without issue
Upvotes: 0
Views: 2113
Reputation: 4750
It's better not to use local
master URL in EMR cluster since you won't benefit from using slave nodes. Local means that spark will run locally on the system where it is launched and won't try to use other nodes in the cluster. The main purpose of local
is local testing and whenever you want to run in a cluster you should choose a resource manager (yarn, mesos, spark-standalone or Kubernetes cluster, see here for more details).
You can provide the master URL as argument to spark-submit
command so that if you run it locally you pass 'local' and for EMR cluster pass 'yarn', for example.
val spark = SparkSession.builder
.appName("Graph Creation")
.config("spark.sql.warehouse.dir", "warehouse")
.config("spark.sql.shuffle.partitions", "1")
.getOrCreate()
And then locally:
./bin/spark-submit --master local[2] ...
On EMR:
./bin/spark-submit --master yarn ...
Upvotes: 3