Reputation: 895
I'm trying to run a spark job with yarn using:
./bin/spark-submit --class "KafkaToMaprfs" --master yarn --deploy-mode client /home/mapr/kafkaToMaprfs/target/scala-2.10/KafkaToMaprfs.jar
But facing this error:
/opt/mapr/hadoop/hadoop-2.7.0 17/01/03 11:19:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/01/03 11:19:38 ERROR SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:530) at KafkaToMaprfs$.main(KafkaToMaprfs.scala:61) at KafkaToMaprfs.main(KafkaToMaprfs.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:752) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/01/03 11:19:39 WARN MetricsSystem: Stopping a MetricsSystem that is not running Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:530) at KafkaToMaprfs$.main(KafkaToMaprfs.scala:61) at KafkaToMaprfs.main(KafkaToMaprfs.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:752) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I have a multi node cluster, i'm deploying the application from a remote node. I'm using spark 1.6.1 and hadoop 2.7.x versions.
I didn't set the cluster, so I couldn't find where the mistake lies.
Can anyone please help me fix this?
Upvotes: 1
Views: 1636
Reputation: 895
In my case i'm using MapR distribution.And i didn't configure the environment. So, when i dug down to the all the conf folders.I made some changes in the below files,
1. In Spark-env.sh,Make sure these values are set right.
export SPARK_LOG_DIR=
export SPARK_PID_DIR=
export HADOOP_HOME=
export HADOOP_CONF_DIR=
export JAVA_HOME=
export SPARK_SUBMIT_OPTIONS=
2. yarn-env.sh.
Make sure the yarn_conf_dir, and java_home are set with correct values.
3. In spark-defaults.conf
1.spark.driver.extraClassPath 2.set value for HADOOP_CONF_DIR
4. HADOOP_CONF_DIR and JAVA_HOME in $SPARK_HOME/conf/spark-env.sh
1.export HADOOP_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop
2.export JAVA_HOME =
5.spark assembly jar
1.Copy the following JAR file from the local file system to a world-readable location on MapR-FS: Substitute your Spark version and specific JAR file name in the command. /opt/mapr/spark/spark-/lib/spark-assembly--hadoop-mapr-.jar
Now i'm able to run my spark application as YARN-CLIENT smoothly using spark-submit. These are basic essentials to make spark connect with yarn. Correct me if i missed any other things.
Upvotes: 1