Meisam Emamjome
Meisam Emamjome

Reputation: 508

Can Spark code be run on cluster without spark-submit?

I would like to develop a Scala application which connects a master and runs a spark piece of code. I would like to achieve this without using spark-submit. Is this possible? Particularly I would like to know if the following code can run from my machine and connect to a cluster:

val conf = new SparkConf()
  .setAppName("Meisam")
  .setMaster("yarn-client")

val sc = new SparkContext(conf)

val sqlContext = new SQLContext(sc)
val df = sqlContext.sql("SELECT * FROM myTable")

...

Upvotes: 9

Views: 8878

Answers (3)

Ido.Schwartzman
Ido.Schwartzman

Reputation: 56

As opposed to what has been said here, I think it's only partially possible, as I've recently discovered the hard way, being the Spark newbie that I am. While you can definitely connect to a cluster as noted above and run code on it, you may encounter problems when you start doing anything non-trivial, even something as simple as using UDF's (user-defined-function, AKA anything not already included in Spark). Have a look here https://issues.apache.org/jira/browse/SPARK-18075, and the other related tickets, and most importantly, at the responses. Also, this seems useful (having a look at it now): Submitting spark app as a yarn job from Eclipse and Spark Context

Upvotes: 1

xfreewind
xfreewind

Reputation: 91

add a conf

val conf = new SparkConf() .setAppName("Meisam") .setMaster("yarn-client") .set("spark.driver.host", "127.0.0.1");

Upvotes: 8

Jacek Laskowski
Jacek Laskowski

Reputation: 74669

Yes, it's possible and basically what you did is all that's needed to have tasks running on YARN cluster in the client deploy mode (where the driver runs on the machine where the app runs).

spark-submit helps you to leave your code free of few SparkConf settings that are required for proper execution like master URL. When you keep your code free of the low-level details, you could deploy your Spark applications on any Spark cluster - YARN, Mesos, Spark Standalone and local - without recompiling them.

Upvotes: 5

Related Questions