L. Don
L. Don

Reputation: 413

Cannot reach Spark Web UI located inside a Docker container

I have a remote Virtual Machine and I'm developing a Spark Application that runs inside a Docker container.

2018-12-16 13:07:10 INFO  main [AbstractConnector] - Started ServerConnector@79c3f01f{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-12-16 13:07:10 INFO  main [Utils] - Successfully started service 'SparkUI' on port 4040.
...
2018-12-16 13:07:10 INFO  main [SparkUI] - Bound SparkUI to 0.0.0.0, and started at http://f58300e7e6ea:4040

The log says that it launched SparkUI correctly, but binded to the container's localhost. At this point i decided to EXPOSE port 4040 during the building phase and to publish it during run with -p 4040:4040/tcp binding the two ports.

When i try to reach <remote host name>:4040 from my local machine (in Firefox) i can't connect to SparkUI. I also tried to telnet it but nothing.

When i start the container i can see the 4040 port listeining using netstat but maybe it is not reachable from remote. How can i manage to do this?

Basically I want to reach SparkUI from my Home PC --> Remote VM --> (Docker Container with Spark UI) using my browser.

The Remote VM runs RHEL 7.

Upvotes: 6

Views: 9007

Answers (3)

nitinr708
nitinr708

Reputation: 1467

You have to map the ports in docker command for the servers you are starting. See -p

Below worked for me on:

2021 M1 Pro Mac Sonoma 14.4.1 (23E224) arm64 architecture

running Docker Desktop 4.37.2 locally

using glue image -> glue/aws-glue-libs:glue_libs_4.0.0_image_01

Command:

$ docker run -it -v ~/.aws:/home/glue_user/.aws -v /Users/x/Library/notebooks:/home/glue_user/workspace/jupyter_workspace/ --rm -p 4040:4040 -p 18080:18080 -p 8998:8998 -p 8888:8888  -e AWS_PROFILE=default -e DISABLE_SSL="true" --name glue_pyspark --mount type=bind,src=/Users/x/Library/dockerdir,dst=/mnt/external public.ecr.aws/glue/aws-glue-libs:glue_libs_4.0.0_image_01 /home/glue_user/jupyter/jupyter_start.sh

After this step I was able to access all the below urls on my host Chrome browser -

Spark UI on http://localhost:4040

History Server on http://localhost:18080

Livy Server on http://localhost:8998

Jupyter Lab on http://localhost:8888

Reference: AWS article

Upvotes: 0

Pankaj Kumar
Pankaj Kumar

Reputation: 301

The below command worked for me for pyspark

docker run -p 4040:4040 --hostname localhost -it apache/spark-py /opt/spark/bin/pyspark

Upvotes: 5

Hansika Weerasena
Hansika Weerasena

Reputation: 3364

In your logs it says that spark UI is started at started at http://f58300e7e6ea:4040 in here f58300e7e6ea is a docker internal network hostname.

So what you have to do is following,

First in your application before deployment set following two configs

  1. spark.driver.bindAddress as the hostname of any string of your choise
  2. spark.driver.host as your Remote VM ip address.

Secondly when you are deploying the docker container using image use --hostname flag to introduce a hostname to the container and use the previously selected string. As example docker run --hostname myHostName --ip 10.1.2.3 ubuntu:16.04

Upvotes: 3

Related Questions