Reputation: 1513
I am trying to install pyspark. Following this thread here, particularly advice from OneCricketeer and zero323.
I have done the following:
1 - Install pyspark in anaconda3 with conda install -c conda-forge pyspark
2 - Set up this in my .bashrc
file:
function snotebook ()
{
#Spark path (based on your computer)
SPARK_PATH=~/spark-3.0.1-bin-hadoop3.2
export ANACONDA_ROOT=~/anaconda3
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/bin/python
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
# For python 3 users, you have to add the line below or you will get an error
#export PYSPARK_PYTHON=python3
$SPARK_PATH/bin/pyspark --master local[2]
}
I have Python 3.8.2, anaconda3. I downloaded spark 3.0.1 with hadoop 3.2.
The .bashrc
setup partially follows this article from Medium here
When I tried import pyspark as ps
, I get No module named 'pyspark'
.
What am I missing? Thanks.
Upvotes: 0
Views: 710
Reputation: 79
I work with PySpark a lot and found these three simple steps which always work irrespective of the OS. For this example, I am going to depict for MacOS
pip install pyspark
https://www.oracle.com/in/java/technologies/javase/javase-jdk8-downloads.html
[JAVA_HOME, SPARK_HOME, PYTHONPATH]
JAVA_HOME - (Check the path where JAVA is installed. It is usually below in McOS)
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_291.jdk/Contents/Home/
SPAK_HOME - Check the path where pyspark is installed. One hack is to run below command which will give pyspark and its py4j path.
pip install pyspark
Requirement already satisfied: pyspark in /opt/anaconda3/lib/python3.7/site-packages (2.4.0)
Requirement already satisfied: py4j==0.10.7 in /opt/anaconda3/lib/python3.7/site-packages (from pyspark) (0.10.7)
Use above two paths to set following environment variables:
export SPARK_HOME=/opt/anaconda3/lib/python3.7/site-packages/pyspark
export PYTHONPATH=/opt/anaconda3/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip:$PYTHONPATH
Run below command to reset ~/.bash_profile
source ~/.bash_profile
Upvotes: 2