Reputation: 413
I have a remote Virtual Machine and I'm developing a Spark Application that runs inside a Docker container.
2018-12-16 13:07:10 INFO main [AbstractConnector] - Started ServerConnector@79c3f01f{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-12-16 13:07:10 INFO main [Utils] - Successfully started service 'SparkUI' on port 4040.
...
2018-12-16 13:07:10 INFO main [SparkUI] - Bound SparkUI to 0.0.0.0, and started at http://f58300e7e6ea:4040
The log says that it launched SparkUI correctly, but binded to the container's localhost. At this point i decided to EXPOSE
port 4040 during the building phase and to publish it during run
with -p 4040:4040/tcp
binding the two ports.
When i try to reach <remote host name>:4040
from my local machine (in Firefox) i can't connect to SparkUI. I also tried to telnet it but nothing.
When i start the container i can see the 4040 port listeining using netstat
but maybe it is not reachable from remote. How can i manage to do this?
Basically I want to reach SparkUI from my Home PC --> Remote VM --> (Docker Container with Spark UI) using my browser.
The Remote VM runs RHEL 7.
Upvotes: 6
Views: 9007
Reputation: 1467
You have to map the ports in docker command for the servers you are starting. See
-p
Below worked for me on:
2021 M1 Pro Mac Sonoma 14.4.1 (23E224) arm64 architecture
running Docker Desktop 4.37.2 locally
using glue image -> glue/aws-glue-libs:glue_libs_4.0.0_image_01
Command:
$ docker run -it -v ~/.aws:/home/glue_user/.aws -v /Users/x/Library/notebooks:/home/glue_user/workspace/jupyter_workspace/ --rm -p 4040:4040 -p 18080:18080 -p 8998:8998 -p 8888:8888 -e AWS_PROFILE=default -e DISABLE_SSL="true" --name glue_pyspark --mount type=bind,src=/Users/x/Library/dockerdir,dst=/mnt/external public.ecr.aws/glue/aws-glue-libs:glue_libs_4.0.0_image_01 /home/glue_user/jupyter/jupyter_start.sh
After this step I was able to access all the below urls on my host Chrome browser -
Spark UI on http://localhost:4040
History Server on http://localhost:18080
Livy Server on http://localhost:8998
Jupyter Lab on http://localhost:8888
Reference: AWS article
Upvotes: 0
Reputation: 301
The below command worked for me for pyspark
docker run -p 4040:4040 --hostname localhost -it apache/spark-py /opt/spark/bin/pyspark
Upvotes: 5
Reputation: 3364
In your logs it says that spark UI is started at started at http://f58300e7e6ea:4040
in here f58300e7e6ea
is a docker internal network hostname.
So what you have to do is following,
First in your application before deployment set following two configs
spark.driver.bindAddress
as the hostname of any string of your choisespark.driver.host
as your Remote VM ip address.Secondly when you are deploying the docker container using image use --hostname
flag to introduce a hostname to the container and use the previously selected string. As example docker run --hostname myHostName --ip 10.1.2.3 ubuntu:16.04
Upvotes: 3