Reputation: 508
I would like to develop a Scala application which connects a master and runs a spark piece of code. I would like to achieve this without using spark-submit. Is this possible? Particularly I would like to know if the following code can run from my machine and connect to a cluster:
val conf = new SparkConf()
.setAppName("Meisam")
.setMaster("yarn-client")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val df = sqlContext.sql("SELECT * FROM myTable")
...
Upvotes: 9
Views: 8878
Reputation: 56
As opposed to what has been said here, I think it's only partially possible, as I've recently discovered the hard way, being the Spark newbie that I am. While you can definitely connect to a cluster as noted above and run code on it, you may encounter problems when you start doing anything non-trivial, even something as simple as using UDF's (user-defined-function, AKA anything not already included in Spark). Have a look here https://issues.apache.org/jira/browse/SPARK-18075, and the other related tickets, and most importantly, at the responses. Also, this seems useful (having a look at it now): Submitting spark app as a yarn job from Eclipse and Spark Context
Upvotes: 1
Reputation: 91
add a conf
val conf = new SparkConf()
.setAppName("Meisam")
.setMaster("yarn-client")
.set("spark.driver.host", "127.0.0.1");
Upvotes: 8
Reputation: 74669
Yes, it's possible and basically what you did is all that's needed to have tasks running on YARN cluster in the client deploy mode (where the driver runs on the machine where the app runs).
spark-submit
helps you to leave your code free of few SparkConf
settings that are required for proper execution like master URL. When you keep your code free of the low-level details, you could deploy your Spark applications on any Spark cluster - YARN, Mesos, Spark Standalone and local - without recompiling them.
Upvotes: 5