Reputation: 517
What are the steps to run Spark on YARN
What I have done so far. Created a user yarn and Installed both Spark and Hadoop. Ran a spark job locally. I need help with the configs specially with (client side) configuration files for the Hadoop cluster. Unable to figure out where to put them, link them and getting errors for a long time now.
Check this spark-submit unable to connect
Upvotes: 0
Views: 926
Reputation: 1034
STEP 1: Configure YARN properly(yarn-site.xml) with some online reference and then for sanity checking whether YARN is installed properly or not run the below command
yarn jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /tmp/abhijeet/hadoop.in /tmp/abhijeet/out/out.1
If this is working fine then your good to go.
STEP 2: Install Spark with reference to some online content and do the sanity checking using the below command whether spark is installed properly or not
opt/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master local[*] /opt/spark/examples/jars/spark-examples_2.11-2.1.1.jar
If this is working fine, it means spark is installed properly.
STEP 3: Now, it's the time to run spark over yarn
Run the below given command
/opt/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --executor-memory 2G --num-executors 2 /opt/spark/examples/jars/spark-examples_2.11-2.1.1.jar
If this one is working fine then Congratulations!
NOTE: The above given path is local to my system and all the jars used comes with the default package of Hadoop and Spark.
Upvotes: 0
Reputation: 676
I guess this is what you're looking for.
I guess you know that Yarn provides the resources to run your jobs. So you have to define the master as YARN in your codes. And then upload the data on hdfs to run the Spark Jobs. I am attaching the apache docs, where you can find the guidance.
Upvotes: 0