Reputation: 429
I wish to connect to a remote cluster and execute a Spark process. So, from what I have read, this is specified in the SparkConf.
val conf = new SparkConf()
.setAppName("MyAppName")
.setMaster("spark://my_ip:7077")
Where my_ip is the IP address of my cluster. Unfortunately, I get connection refused. So, I am guessing some credentials must be added to connect correctly. How would I specify the credentials? It seems it would be done with .set(key, value), but have no leads on this.
Upvotes: 8
Views: 13009
Reputation: 5220
There are two things missing:
yarn
(setMaster("yarn")) and the deploy-mode to cluster
,
your current setup is used for Spark standalone. More info here:
http://spark.apache.org/docs/latest/configuration.html#application-propertiesyarn-site.xml
and core-site.xml
files from the cluster and put them in HADOOP_CONF_DIR
, so that Spark can pick up yarn settings, such as the IP of your master node. More info: https://theckang.github.io/2015/12/31/remote-spark-jobs-on-yarn.htmlBy the way, this would work if you use spark-submit
to submit a job, programatically it's more complex to achieve it and could only use yarn-client
mode which is tricky to setup remotely.
Upvotes: 4
Reputation: 2281
--master yarn
for your spark-submi
t command or setMaster("yarn")
in app configuration initialization."spark-submit"
command from remote host can be used popuar Java Secure Channel (JSCH) of course environmental parameter should be set on cluster properlyUpvotes: 0