Reputation: 5178
I'm trying to make a hello world example work with spark+docker, and here is my code.
object Generic {
def main(args: Array[String]) {
val sc = new SparkContext("spark://172.17.0.3:7077", "Generic", "/opt/spark-0.9.0")
val NUM_SAMPLES = 100000
val count = sc.parallelize(1 to NUM_SAMPLES).map{i =>
val x = Math.random * 2 - 1
val y = Math.random * 2 - 1
if (x * x + y * y < 1) 1.0 else 0.0
}.reduce(_ + _)
println("Pi is roughly " + 4 * count / NUM_SAMPLES)
}
}
When I run sbt run
, I get
14/05/28 15:19:58 INFO client.AppClient$ClientActor: Connecting to master spark://172.17.0.3:7077...
14/05/28 15:20:08 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
I checked both the cluster UI, where I have 3 nodes that each have 1.5g of memory, and the namenode UI, where I see the same thing.
The docker logs show no output from the workers and the following from the master
14/05/28 21:20:38 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@master:7077] -> [akka.tcp://[email protected]:48085]: Error [Association failed with [akka.tcp://[email protected]:48085]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://[email protected]:48085]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /10.0.3.1:48085
]
This happens a couple times, and then the program times out and dies with
[error] (run-main-0) org.apache.spark.SparkException: Job aborted: Spark cluster looks down
When I did a tcpdump over the docker0
interface, and it looks like the workers and the master nodes are talking.
However, the spark console works.
If I set sc
as val sc = new SparkContext("local", "Generic", System.getenv("SPARK_HOME"))
, the program runs
Upvotes: 8
Views: 1904
Reputation: 857
You have to check firewall if you are on Windows host and make sure java.exe
is allowed to access the public network or change dockerNAT to private. In general, the worker must be able to connect back to the driver (the program you submitted).
Upvotes: 0
Reputation: 651
For running spark on Docker it's crucial to
Without handling all 3 issues spark cluster parts(master, worker, driver) can't communicate. You can read closely on every issue on http://sometechshit.blogspot.ru/2015/04/running-spark-standalone-cluster-in.html or use container ready for spark from https://registry.hub.docker.com/u/epahomov/docker-spark/
Upvotes: 0
Reputation: 37435
I've been there. The issue looks like the AKKA actor subsystem in Spark is binding on a different interface than Spark on docker0.
While your master ip is on: spark://172.17.0.3:7077
Akka is binding on: akka.tcp://[email protected]:48085
If you masters/slaves are docker containers, they should be communicating through the docker0 interface in the 172.17.x.x range.
Try providing the master and slaves with their correct local IP using the env config SPARK_LOCAL_IP
. See config docs for details.
In our docker setup for Spark 0.9 we are using this command to start the slaves:
${SPARK_HOME}/bin/spark-class org.apache.spark.deploy.worker.Worker $MASTER_IP -i $LOCAL_IP
Which directly provides the local IP to the worker.
Upvotes: 5