Ramkumar V
Ramkumar V

Reputation: 11

Spark Streaming failing on YARN Cluster

I have a cluster of 1 master and 2 slaves. I'm running a spark streaming in master and I want to utilize all nodes in my cluster. i had specified some parameters like driver memory and executor memory in my code. when i give --deploy-mode cluster --master yarn-cluster in my spark-submit, it gives the following error.

> log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/08/12 13:24:49 INFO Client: Requesting a new application from cluster with 3 NodeManagers
15/08/12 13:24:49 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/08/12 13:24:49 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/08/12 13:24:49 INFO Client: Setting up container launch context for our AM
15/08/12 13:24:49 INFO Client: Preparing resources for our AM container
15/08/12 13:24:49 INFO Client: Source and destination file systems are the same. Not copying file:/home/hdfs/spark-1.4.1/assembly/target/scala-2.10/spark-assembly-1.4.1-hadoop2.5.0-cdh5.3.5.jar
15/08/12 13:24:49 INFO Client: Source and destination file systems are the same. Not copying file:/home/hdfs/spark-1.4.1/external/kafka-assembly/target/spark-streaming-kafka-assembly_2.10-1.4.1.jar
15/08/12 13:24:49 INFO Client: Source and destination file systems are the same. Not copying file:/home/hdfs/spark-1.4.1/python/lib/pyspark.zip
15/08/12 13:24:49 INFO Client: Source and destination file systems are the same. Not copying file:/home/hdfs/spark-1.4.1/python/lib/py4j-0.8.2.1-src.zip
15/08/12 13:24:49 INFO Client: Source and destination file systems are the same. Not copying file:/home/hdfs/spark-1.4.1/examples/src/main/python/streaming/kyt.py
15/08/12 13:24:49 INFO Client: Setting up the launch environment for our AM container
15/08/12 13:24:49 INFO SecurityManager: Changing view acls to: hdfs
15/08/12 13:24:49 INFO SecurityManager: Changing modify acls to: hdfs
15/08/12 13:24:49 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); users with modify permissions: Set(hdfs)
15/08/12 13:24:49 INFO Client: Submitting application 3808 to ResourceManager
15/08/12 13:24:49 INFO YarnClientImpl: Submitted application application_1437639737006_3808
15/08/12 13:24:50 INFO Client: Application report for application_1437639737006_3808 (state: ACCEPTED)
15/08/12 13:24:50 INFO Client: 
   client token: N/A
   diagnostics: N/A
   ApplicationMaster host: N/A
   ApplicationMaster RPC port: -1
   queue: root.hdfs
   start time: 1439385889600
   final status: UNDEFINED
   tracking URL: http://hostname:port/proxy/application_1437639737006_3808/
   user: hdfs
15/08/12 13:24:51 INFO Client: Application report for application_1437639737006_3808 (state: ACCEPTED)
15/08/12 13:24:52 INFO Client: Application report for application_1437639737006_3808 (state: ACCEPTED)
15/08/12 13:24:53 INFO Client: Application report for application_1437639737006_3808 (state: ACCEPTED)
15/08/12 13:24:54 INFO Client: Application report for application_1437639737006_3808 (state: ACCEPTED)
15/08/12 13:24:55 INFO Client: Application report for application_1437639737006_3808 (state: ACCEPTED)
15/08/12 13:24:56 INFO Client: Application report for application_1437639737006_3808 (state: ACCEPTED)
15/08/12 13:24:57 INFO Client: Application report for application_1437639737006_3808 (state: ACCEPTED)
15/08/12 13:24:58 INFO Client: Application report for application_1437639737006_3808 (state: ACCEPTED)
15/08/12 13:24:59 INFO Client: Application report for application_1437639737006_3808 (state: ACCEPTED)
15/08/12 13:25:00 INFO Client: Application report for application_1437639737006_3808 (state: ACCEPTED)
15/08/12 13:25:01 INFO Client: Application report for application_1437639737006_3808 (state: ACCEPTED)
15/08/12 13:25:02 INFO Client: Application report for application_1437639737006_3808 (state: ACCEPTED)
15/08/12 13:25:03 INFO Client: Application report for application_1437639737006_3808 (state: FAILED)
15/08/12 13:25:03 INFO Client: 
   client token: N/A
   diagnostics: Application application_1437639737006_3808 failed 2 times due to AM Container for appattempt_1437639737006_3808_000002 exited with  exitCode: -1000 due to: File file:/home/hdfs/spark-1.4.1/python/lib/pyspark.zip does not exist
.Failing this attempt.. Failing the application.
   ApplicationMaster host: N/A
   ApplicationMaster RPC port: -1
   queue: root.hdfs
   start time: 1439385889600
   final status: FAILED
   tracking URL: http://hostname:port/cluster/app/application_1437639737006_3808
   user: hdfs
Exception in thread "main" org.apache.spark.SparkException: Application application_1437639737006_3808 finished with failed status
  at org.apache.spark.deploy.yarn.Client.run(Client.scala:855)
  at org.apache.spark.deploy.yarn.Client$.main(Client.scala:881)
  at org.apache.spark.deploy.yarn.Client.main(Client.scala)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
  at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
  at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

How to fix this issue ? Please help me if i'm doing wrong.

Upvotes: 1

Views: 2736

Answers (4)

G.V. Sridhar
G.V. Sridhar

Reputation: 139

I was able to solve this by providing the driver memory and executor memory at run time.

spark-submit --driver-memory 1g --executor-memory 1g --class com.package.App --master yarn --deploy-mode cluster /home/spark.jar

Upvotes: 0

wanbo
wanbo

Reputation: 87

The file:/home/hdfs/spark-1.4.1/python/lib/pyspark.zip you submit does not exist.

Upvotes: 1

Sidharth
Sidharth

Reputation: 11

I recently ran into the same issue. Here was my scenario:

Cloudera Managed CDH 5.3.3 cluster with 7 nodes. I was submitting the job from one of the nodes and it used to fail in both yarn-cluster and yarn-master modes with the same issue.

If you look at the stacktrace, you'll find this line-

15/08/12 13:24:49 INFO Client: Source and destination file systems are the same. Not copying file:/home/hdfs/spark-1.4.1/external/kafka-assembly/target/spark-streaming-kafka-assembly_2.10-1.4.1.jar
15/08/12 13:24:49 INFO Client: Source and destination file systems are the same. Not copying file:/home/hdfs/spark-1.4.1/python/lib/pyspark.zip
15/08/12 13:24:49 INFO Client: Source and destination file systems are the same. Not copying file:/home/hdfs/spark-1.4.1/python/lib/py4j-0.8.2.1-src.zip
15/08/12 13:24:49 INFO Client: Source and destination file systems are the same. Not copying file:/home/hdfs/spark-1.4.1/examples/src/main/python/streaming/kyt.py

This is the reason why the job fails because resources are not copied.

In my case, it was resolved by correcting the HADOOP_CONF_DIR path. It wasn't pointing to the exact folder that contains the core-site.xml and yarn-site.xml and other configuration files. Once this was fixed, the resources were copied during the initiation of the ApplicationMaster and the job ran correctly.

Upvotes: 0

Murtaza Kanchwala
Murtaza Kanchwala

Reputation: 2483

While running with the Yarn Cluster mode, you always need to specify the other Memory settings for your executors and there individually memory, Plus you always need to specify the driver details also. Now for Example

Amazon EC2 Environment (Reserved already):

m3.xlarge | CORES : 4(1) | RAM : 15 (3.5) | HDD : 80 GB | Nodes : 3 Nodes

spark-submit --class <YourClassFollowedByPackage> --master yarn-cluster --num-executors 2 --driver-memory 8g --executor-memory 8g --executor-cores 1 <Your Jar with Full Path> <Jar Args>

Always remember to add the other third-party libraries or jars to your Classpath in each of the Task Nodes, You can add them directly to your Spark or Hadoop Classpath on each of your Node.

Notes : 1) If you're using the Amazon EMR then It can be achieved using Custom Bootstrap Actions and S3. 2) Remove the conflicting jars too. Sometimes you'll see an unnecessary NullPointerException and this could be one of the key reason for it.

If possible add your stacktrace using

yarn logs -applicationId <HadoopAppId>

So that I can answer you in more specific way.

Upvotes: 0

Related Questions