Reputation: 14978
I am using this setup (https://github.com/mvillarrealb/docker-spark-cluster.git) to established a Spark Cluster but none of the IPs mentioned there like 10.5.0.2
area accessible via browser and giving timeout. I am unable to figure out what's wrong am I doing?
I am using Docker 2.3 on macOS Catalina.
In the spark-base
Dockerfile I am using the following settings instead of one given there:
ENV DAEMON_RUN=true
ENV SPARK_VERSION=3.0.0
ENV HADOOP_VERSION=3.2
ENV SCALA_VERSION=2.12.4
ENV SCALA_HOME=/usr/share/scala
ENV SPARK_HOME=/spark
Also when running, it still shows Spark 2.4.3 on the console when trying to run web UI.
Upvotes: 1
Views: 1556
Reputation: 6350
The Dockerfile tells the container what port to expose.
The compose-file tells the host which ports to expose and to which ports should be the traffic forwarded inside the container.
If the source port is not specified, a random port should be generated. This statement helps in this scenario because you have multiple workers and you cannot specify a unique source port for all of them - this would result in a conflict.
version: "3.7"
services:
spark-master:
image: spydernaz/spark-master:latest
ports:
- "9090:8080"
- "7077:7077"
volumes:
- ./apps:/opt/spark-apps
- ./data:/opt/spark-data
environment:
- "SPARK_LOCAL_IP=spark-master"
spark-worker:
image: spydernaz/spark-worker:latest
depends_on:
- spark-master
ports:
- "8081"
environment:
- SPARK_MASTER=spark://spark-master:7077
- SPARK_WORKER_CORES=1
- SPARK_WORKER_MEMORY=1G
- SPARK_DRIVER_MEMORY=128m
- SPARK_EXECUTOR_MEMORY=256m
volumes:
- ./apps:/opt/spark-apps
- ./data:/opt/spark-data
To find the randomly generated published port for each of the workers, run docker ps
. Under the column PORTS you should find what you need:
PORTS
0.0.0.0:32768->8080/tcp
32768
will forward from the host machine (localhost:32768
) to the [worker-IP]:8080
Upvotes: 1