Reputation: 5178

Spark atop of Docker not accepting jobs

I'm trying to make a hello world example work with spark+docker, and here is my code.

object Generic {
  def main(args: Array[String]) {
    val sc = new SparkContext("spark://172.17.0.3:7077", "Generic", "/opt/spark-0.9.0")

    val NUM_SAMPLES = 100000
    val count = sc.parallelize(1 to NUM_SAMPLES).map{i =>
      val x = Math.random * 2 - 1
      val y = Math.random * 2 - 1
      if (x * x + y * y < 1) 1.0 else 0.0
    }.reduce(_ + _)

    println("Pi is roughly " + 4 * count / NUM_SAMPLES)
  }
}

When I run sbt run, I get

14/05/28 15:19:58 INFO client.AppClient$ClientActor: Connecting to master spark://172.17.0.3:7077...
14/05/28 15:20:08 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

I checked both the cluster UI, where I have 3 nodes that each have 1.5g of memory, and the namenode UI, where I see the same thing.

The docker logs show no output from the workers and the following from the master

14/05/28 21:20:38 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@master:7077] -> [akka.tcp://[email protected]:48085]: Error [Association failed with [akka.tcp://[email protected]:48085]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://[email protected]:48085]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /10.0.3.1:48085

]

This happens a couple times, and then the program times out and dies with

[error] (run-main-0) org.apache.spark.SparkException: Job aborted: Spark cluster looks down

When I did a tcpdump over the docker0 interface, and it looks like the workers and the master nodes are talking.

However, the spark console works.

If I set sc as val sc = new SparkContext("local", "Generic", System.getenv("SPARK_HOME")), the program runs

Upvotes: 8

Answers (3)

softwarevamp

Reputation: 857

You have to check firewall if you are on Windows host and make sure java.exe is allowed to access the public network or change dockerNAT to private. In general, the worker must be able to connect back to the driver (the program you submitted).

Upvotes: 0

epahomov

Reputation: 651

For running spark on Docker it's crucial to

Expose all necessary ports
Set correct spark.broadcast.factory
Handle docker aliases

Without handling all 3 issues spark cluster parts(master, worker, driver) can't communicate. You can read closely on every issue on http://sometechshit.blogspot.ru/2015/04/running-spark-standalone-cluster-in.html or use container ready for spark from https://registry.hub.docker.com/u/epahomov/docker-spark/

Upvotes: 0

maasg

Reputation: 37435

I've been there. The issue looks like the AKKA actor subsystem in Spark is binding on a different interface than Spark on docker0.

While your master ip is on: spark://172.17.0.3:7077

Akka is binding on: akka.tcp://[email protected]:48085

If you masters/slaves are docker containers, they should be communicating through the docker0 interface in the 172.17.x.x range.

Try providing the master and slaves with their correct local IP using the env config SPARK_LOCAL_IP. See config docs for details.

In our docker setup for Spark 0.9 we are using this command to start the slaves:

${SPARK_HOME}/bin/spark-class org.apache.spark.deploy.worker.Worker $MASTER_IP -i $LOCAL_IP

Which directly provides the local IP to the worker.

Upvotes: 5

Spark atop of Docker not accepting jobs

Answers (3)

Related Questions