Reputation: 767
I installed gettyimages/spark
docker
image and jupyter/pyspark-notebook
inside my machine.
However as the gettyimage/spark
python version is 3.5.3
while jupyter/pyspark-notebook
python version is 3.7
, the following error come out:
Exception: Python in worker has different version 3.5 than that in driver 3.7, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
So, i have tried to upgrade the python version of gettyimage/spark
image OR downgrade the python version of jupyter/pyspark-notebook
docker image to fix it.
jupyter/pyspark-notebook
python version first:I use conda install python=3.5
to downgrade the python version of jupyter/pyspark-notebook
docker image. However, after i do so , my jupyter
notebook cannot connect to any single ipynb
and the kernel seems dead. Also, when i type conda
again, it shows me conda command not found
, but python terminal work well
I have compare the sys.path before the downgrade and after it
['', '/usr/local/spark/python', '/usr/local/spark/python/lib/py4j-0.10.7-src.zip', '/opt/conda/lib/python35.zip', '/opt/conda/lib/python3.5', '/opt/conda/lib/python3.5/plat-linux', '/opt/conda/lib/python3.5/lib-dynload', '/opt/conda/lib/python3.5/site-packages']
['', '/usr/local/spark/python', '/usr/local/spark/python/lib/py4j-0.10.7-src.zip', '/opt/conda/lib/python37.zip', '/opt/conda/lib/python3.7', '/opt/conda/lib/python3.7/lib-dynload', '/opt/conda/lib/python3.7/site-packages']
I think more or less, it is correct. So why i cannot use my jupyter
notebook to connect to the kennel?
gettyimage/spark
imagesudo docker run -it gettyimages/spark:2.4.1-hadoop-3.0 apt-get install python3.7.3 ; python3 -v
However, I find that even i do so, i cannot run the spark well.
I am not quite sure what to do. May you share with me how to modify the docker images internal package version
Upvotes: 4
Views: 23434
Reputation: 631
If I look at the Dockerfile here, it installs python3
which by default is installing python 3.5 for debian:stretch
. You can instead install python 3.7 by editing the Dockerfile and building it yourself. In your Dockerfile, remove lines 19-25 and replace line 1 with the following, and then build the image locally.
FROM python:3.7-stretch
If you are not familiar with building your own image, download the Dockerfile
and keep it in its own standalone directory. Then after cd
into the directory, run the command below . You may want to first remove the already downloaded image. After this you should be able to run other docker commands the same way as if you had pulled the image from docker hub.
docker build -t gettyimages/spark .
Upvotes: 6