Reputation: 2535
I am very new to spark, just learning so please bear with me if I talk like a novice.
I have a regular java jar which is self contained,
The function of this jar is to listen to a queue and process some messages. Now the requirement is to read from the queue in a distributed fashion so I have a spark master and three slaves managed by Yarn. When I ./spark-submit this jar file on the standalone master all works fine. When I switch to a cluster mode by setting Yarn as master in the commandline I get lots of errors of file not found at HDFS. I read up on stack and saw that I have to mention SparkContext but however I see no use of it in my case.
There is questions here:
Do I still have to use the
SparkConf conf = new SparkConf().setMaster("yarn-cluster").setAppName("TibcoMessageConsumer");
SparkContext sparkContext = new SparkContext(conf);
I dont see any usage of sparkContext
in my case.
Upvotes: 2
Views: 7600
Reputation: 294
Since you are using Yarn, copy the jar to hdfs and then you can reference that in spark-submit. If you want to use a local file system, you have to copy that jar in all the worker nodes [not recommended]
./bin/spark-submit \
--class <main-class> \
--master <master-url> \
--deploy-mode cluster \
myapp-jar
You can look at this link for more details
Upvotes: 1