Reputation: 1281
I have been using spark-submit
to test my codes on the multi-nodes system.
(Of course, I specified the master option as the master server address to achieve multi-nodes environment).
However, instead of using spark-submit
, I would like to use spark-shell to test my codes on the cluster system. However, I don't know how to configure multi-nodes clusters settings on the spark-shell?
I think that just using spark-shell without changing any setups will results in the local mode.
I tried to search the info and followed the below commands.
scala> sc.stop()
...
scala> import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.{SparkContext, SparkConf}
scala> val sc = new SparkContext(new SparkConf().setAppName("shell").setMaster("my server address"))
...
scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext
scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@567a2954
However, I am quite sure that I am doing right behavior for the multi-node cluster setup using spark-shell.
Upvotes: 2
Views: 2159
Reputation: 191844
If you used setMaster("my server address"))
and "my server address" is not "local"
, then it won't run in local mode.
It is fine to set the master address in the code, but in production, you'd set --master
parameter on the CLI to spark-shell
or spark-submit
You can also write a separate .scala
file, and pass that to spark-shell -i <filename>.scala
Upvotes: 1
Reputation: 16086
Have you tried --master
parameter of spark-shell
? For Spark Standalone:
./spark-shell --master spark://master-ip:7077
Spark shell is just a driver, it will connect to any cluster you will write in master parameter
Edit:
For YARN use
./spark-shell --master yarn
Upvotes: 3