Alok
Alok

Reputation: 1506

Spark Streaming stops after sometime due to Executor Lost

I am using spark 1.3 for spark streaming application. When i start my application . I can see in spark UI that few of the jobs have failed tasks. On investigating the job details . I see few of the task were failed due to Executor Lost Exception either ExecutorLostFailure (executor 11 lost) or Resubmitted (resubmitted due to lost executor) . In application logs from yarn the only Error shown is Lost executor 11 on <machineip> remote Akka client disassociated . I dont see any other exception or error being thrown.

The application stops after couple of hours. Logs shows all the executor are lost when application fails. Can anyone suggest or point to link on how to resolve this issue.

Upvotes: 0

Views: 989

Answers (1)

Bryan
Bryan

Reputation: 71

There are many potential options for why you're seeing executor loss. One thing I have observed in the past is that Java garbage collection can take very long periods under heavy load. As a result the executor is 'lost' when the GC takes too long, and returns shortly thereafter.

You can determine if this is the issue by turning on executor GC logging. Simply add the following configuration: --conf "spark.executor.extraJavaOptions=-XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy"

See this great guide from Intel/DataBricks here for more details on GC tuning: https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html

Upvotes: 1

Related Questions