Tristan Tran
Tristan Tran

Reputation: 1513

What is the latest config for installing pyspark?

I am trying to install pyspark. Following this thread here, particularly advice from OneCricketeer and zero323.

I have done the following:

1 - Install pyspark in anaconda3 with conda install -c conda-forge pyspark

2 - Set up this in my .bashrc file:

function snotebook () 
{
#Spark path (based on your computer)
SPARK_PATH=~/spark-3.0.1-bin-hadoop3.2

export ANACONDA_ROOT=~/anaconda3
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/bin/python
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH

# For python 3 users, you have to add the line below or you will get an error 
#export PYSPARK_PYTHON=python3

$SPARK_PATH/bin/pyspark --master local[2]
}

I have Python 3.8.2, anaconda3. I downloaded spark 3.0.1 with hadoop 3.2.

The .bashrc setup partially follows this article from Medium here

When I tried import pyspark as ps, I get No module named 'pyspark'.

What am I missing? Thanks.

Upvotes: 0

Views: 710

Answers (1)

Jatin Malhotra
Jatin Malhotra

Reputation: 79

I work with PySpark a lot and found these three simple steps which always work irrespective of the OS. For this example, I am going to depict for MacOS

  1. Install PySpark using PIP
pip install pyspark
  1. Install JAVA8 using below link

https://www.oracle.com/in/java/technologies/javase/javase-jdk8-downloads.html

  1. Setup Environment variables in ~/.bash_profile

[JAVA_HOME, SPARK_HOME, PYTHONPATH]

JAVA_HOME - (Check the path where JAVA is installed. It is usually below in McOS)

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_291.jdk/Contents/Home/

SPAK_HOME - Check the path where pyspark is installed. One hack is to run below command which will give pyspark and its py4j path.

pip install pyspark

Requirement already satisfied: pyspark in /opt/anaconda3/lib/python3.7/site-packages (2.4.0)
Requirement already satisfied: py4j==0.10.7 in /opt/anaconda3/lib/python3.7/site-packages (from pyspark) (0.10.7)

Use above two paths to set following environment variables:

export SPARK_HOME=/opt/anaconda3/lib/python3.7/site-packages/pyspark

export PYTHONPATH=/opt/anaconda3/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip:$PYTHONPATH

Run below command to reset ~/.bash_profile

source ~/.bash_profile

Upvotes: 2

Related Questions