Reputation: 645
I am running a Bash Script in MAC. This script calls a spark method written in Scala language for a large number of times. I am currently trying to call this spark method for 100,000 times using a for loop.
The code exits with the following exception after running a small number of iterations, around 3000 iterations.
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:518)
at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:547)
at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547)
at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1877)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:547)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
Exception in thread "dag-scheduler-event-loop" 16/11/22 13:37:32 WARN NioEventLoop: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
at io.netty.util.internal.MpscLinkedQueue.offer(MpscLinkedQueue.java:126)
at io.netty.util.internal.MpscLinkedQueue.add(MpscLinkedQueue.java:221)
at io.netty.util.concurrent.SingleThreadEventExecutor.fetchFromScheduledTaskQueue(SingleThreadEventExecutor.java:259)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:346)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: Java heap space
at java.util.regex.Pattern.compile(Pattern.java:1047)
at java.lang.String.replace(String.java:2180)
at org.apache.spark.util.Utils$.getFormattedClassName(Utils.scala:1728)
at org.apache.spark.storage.RDDInfo$$anonfun$1.apply(RDDInfo.scala:57)
at org.apache.spark.storage.RDDInfo$$anonfun$1.apply(RDDInfo.scala:57)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.storage.RDDInfo$.fromRdd(RDDInfo.scala:57)
at org.apache.spark.scheduler.StageInfo$$anonfun$1.apply(StageInfo.scala:87)
Can someone help please, is this error being caused because of a large number of calls to spark method?
Upvotes: 17
Views: 61495
Reputation: 13589
For posterity: I was getting similar errors, but changing memory/timeout settings was not helping at all.
In my case the problem was that somebody was calling socket.setdefaulttimeout
in a library function that I was calling before creating the Spark session. setdefaulttimeout
affected all new sockets created after that point, including the socket that Spark used to communicate with YARN, so that connection would time out unexpectedly.
Needless to say, don't do this.
Upvotes: 0
Reputation: 46
You are seeing this issue due to the executor memory. Try increasing the memory to (x 2) so the containers don't time out while waiting on the remaining containers.
Upvotes: 1
Reputation: 3692
The above stack trace is also shown java heap space its OOM error so once try to increase the memory and run it and regarding timeout its rpc timeout so you can set spark.network.timeout
with timeout value according to your need...
Upvotes: 5
Reputation: 595
Just increase the spark.executor.heartbeatInterval
to 20s, the error says that.
Upvotes: 1
Reputation: 237
pls increase the executer memory so that OOM will go away else make chnage in code so that your RDD
wont have big memory foot print.
--executer-memory = 3G
Upvotes: 1
Reputation: 29155
Its RpcTimeoutException
.. so spark.network.timeout
(spark.rpc.askTimeout
) could be tuned with larger-than-default values in order to handle complex workload. You can start with these values and adjust accordingly to your workloads.
Please see latest
spark.network.timeout
120s Default timeout for all network interactions. This config will be used in place of spark.core.connection.ack.wait.timeout, spark.storage.blockManagerSlaveTimeoutMs, spark.shuffle.io.connectionTimeout, spark.rpc.askTimeout or spark.rpc.lookupTimeout if they are not configured.
Also consider increasing executor memory i.e spark.executor.memory
and most imp thing is review your code, to check whether that is candidate for further optimization.
set by SparkConf: conf.set("spark.network.timeout", "600s")
set by spark-defaults.conf: spark.network.timeout 600s
set when calling spark-submit: --conf spark.network.timeout=600s
Upvotes: 22