Soerendip
Soerendip

Reputation: 9148

How to set port for pyspark jupyter notebook?

I am starting a pyspark jupyter notebook with a script:

#!/bin/bash
ipaddres=...
echo "Start notebook server at IP address $ipaddress"

function snotebook ()
{
#Spark path (based on your computer)
SPARK_PATH=/home/.../software/spark-2.3.1-bin-hadoop2.7

export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"

# For python 3 users, you have to add the line below or you will get an error
export PYSPARK_PYTHON=python3

$SPARK_PATH/bin/pyspark --master local[10]
}

snotebook --no-browser --ip $ipaddress --certfile=/home/.../local/mycert.pem --keyfile /home/.../local/mykey.key  

I wonder how to set the port. Is there an environment variable that I can set? I would like to determine the port before the notebook starts. I tried --port 7999.

Upvotes: 0

Views: 3017

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191701

If you mean Spark UI ports, in the spark-env.sh, it lists these two environment variables that you can overwrite, or set in that file

# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker

I'm not sure the Jupyter values or if PySpark even passes them through, but if jupyter notebook --port works on its own, then I would try

export PYSPARK_DRIVER_PYTHON_OPTS="notebook --port=7999"

If you want to pass all the argument from snotebook into the variable, then you need

export PYSPARK_DRIVER_PYTHON_OPTS="notebook $@"

Upvotes: 1

Related Questions