Miren
Miren

Reputation: 431

apache-spark org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120

I has configured a spark cluster in standalone mode. I can see that both workers are running, but when I start a spark-shell I have this problem: Configuration of spark cluster is automatic.

val lines=sc.parallelize(List(1,2,3,4))

this work ok the new rdd has created but when I start the next task

lines.take(2).foreach(println) 

I have this error that i cant solve:

Output:

 16/02/18 10:27:02 INFO DAGScheduler: Got job 0 (take at :24) with 1 output partitions
 16/02/18 10:27:02 INFO DAGScheduler: Final stage: ResultStage 0 (take at :24)
 16/02/18 10:27:02 INFO DAGScheduler: Parents of final stage: List() 
 16/02/18 10:27:02 INFO DAGScheduler: Missing parents: List() 
 16/02/18 10:27:02 INFO DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize at :21), which has no missing parents
 16/02/18 10:27:03 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1288.0 B, free 1288.0 B)
 16/02/18 10:27:04 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 856.0 B, free 2.1 KB)

A minute and a half later:

16/02/18 10:28:43 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(0,java.io.IOException: Failed to create directory /srv/spark/work/app-20160218102438-0000/0)] in 2 attempts org.apache.spark.rpc.RpcTimeoutException:
Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
    at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) 
    at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
    at scala.util.Try$.apply(Try.scala:161)
    at scala.util.Failure.recover(Try.scala:185)
    at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324) 
    at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324) 
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
    at org.spark-project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293) 
    at scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:133) 
    at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) 
    at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) 
    at scala.concurrent.Promise$class.complete(Promise.scala:55) 
    at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153) 
    at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) 
    at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) 
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) 
    at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.processBatch$1(Future.scala:643) 
    at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply$mcV$sp(Future.scala:658)
    at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635) 
    at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635) 
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) 
    at scala.concurrent.Future$InternalCallbackExecutor$Batch.run(Future.scala:634) 
    at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) 
    at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:685) 
    at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) 
    at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) 
    at scala.concurrent.Promise$class.tryFailure(Promise.scala:112) 
    at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153) 
    at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:241) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) 
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply in 120 seconds 
    at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:242) ... 7 more

In one worker, I can see this log error :

Invalid maximum head size: -Xmxm0M could not create JVM

and also I can see a some problem I think that is related to a problem with binding a port or something like this.

Upvotes: 4

Views: 15198

Answers (2)

pooja shinde
pooja shinde

Reputation: 11

If you are working on VM, you need to have at least 2 CPU processors. You have to set in VM configurations.

Upvotes: 1

Nawa
Nawa

Reputation: 2158

The problem can be in area of ports visibility between spark cluster and client instance.

It is strange from user side but that is the feature of Spark architecture - each Spark node should see client instance and specific port defined by spark.driver.port in SparkContext configuration. This option by default is empty and it's mean that this port will be chosen randomly. As a result with default configuration each Spark node need to see any port of client instance. But you can override spark.driver.port.

It can be a problem if your client machine is behind firewall or inside docker container for example. You need to open this port outside

Upvotes: 4

Related Questions