BetterEveryDay
BetterEveryDay

Reputation: 331

No module name pyspark error

This is the exact code from a tutorial I'm following. My classmate didn't get this error with the same code:

ImportError                                Traceback (most recent call last)

<ipython-input-1-c6e1bed850ab> in <module>()
----> 1 from pyspark import SparkContext
      2 sc = SparkContext('local', 'Exam_3')
      3 
      4 from pyspark.sql import SQLContext
      5 sqlContext = SQLContext(sc)

ImportError: No module named pyspark

This is the code:

from pyspark import SparkContext
sc = SparkContext('local', 'Exam_3')
from pyspark.sql import SQLContext    
sqlContext = SQLContext(sc)
data = sc.textFile("exam3")
parsedData = data.map(lambda line: [float(x) for x in line.split(',')])
retail = sqlContext.createDataFrame(parsedData, 
     ['category_name','product_id', 'product_name', 'product_price'])
retail.registerTempTable("exam3")
print parsedData.take(3)

Upvotes: 32

Views: 214347

Answers (9)

moasifk
moasifk

Reputation: 70

Just in case if you are coming here while using the delta-spark module check the version compatibility of spark and delta-spark packages. Make sure that the versions of PySpark and the Delta Lake library you are using are compatible.

  • PySpark 3.2.x: Delta Lake 1.1.x
  • PySpark 3.3.x: Delta Lake 2.x

eg:

pip install pyspark==3.3.0
pip install delta-spark==2.3.0

spark and delta-spark compatability

Upvotes: 0

lxgxx
lxgxx

Reputation: 11

I meet this error after

conda install pyspark

then i

pip install pyspark

this install py4j automatically, then comes right

Upvotes: 1

goodyonsen
goodyonsen

Reputation: 81

my solution to this problem was to

  • open up a brand new .ipynb on local machine by:
$jupyter-lab
  • next in the very first cell running this:

$pip install pyspark

my output is:

Collecting pyspark   
  Using cached pyspark-3.2.0.tar.gz (281.3 MB)
  Preparing metadata (setup.py) ... done 
Collecting py4j==0.10.9.2
  Using cached py4j-0.10.9.2-py2.py3-none-any.whl (198 kB) 
Building wheels for collected packages: pyspark   
  Building wheel for pyspark (setup.py) ... done   
  Created wheel for pyspark: filename=pyspark-3.2.0-py2.py3-none-any.whl size=281805913 sha256=26e539058858454dbbb48158111968d67e663c7b53e64c4fd91e38d92ac1cd80 
  Stored in directory: /Users/user/Library/Caches/pip/wheels/2f/f8/95/2ad14a4614b4a9f645ee928fbbd057b1b254c67adb494c9a58 
Successfully built pyspark Installing collected packages: py4j, pyspark 
Successfully installed py4j-0.10.9.2 pyspark-3.2.0 
Note: you may need to restart the kernel to use updated packages.
  • then import pyspark:

$import pyspark

You may as well want to try running the pip command directly in the lab env.

Upvotes: 0

rafshaik
rafshaik

Reputation: 19

import findspark findspark.init()

Traceback (most recent call last): File "", line 1, in ImportError: No module named 'findspark'

$ pip install findspark

it will work

Upvotes: 0

mounirboulwafa
mounirboulwafa

Reputation: 1816

Make sure first to install pyspark using conda :

conda install pyspark

Upvotes: 0

kepy97
kepy97

Reputation: 1074

Here is the latest solution that is worked for me FOR MAC users only. I've installed pyspark through pip install pyspark. But, it didn't work when I execute pyspark in terminal or even in python import pyspark. I checked that pyspark already installed in my laptop.

At the end, I found the solution. You just need to add into the bash profile file.

Follow steps:

1) Type the following in a terminal window to go to your home folder.

cd ~

2) Then the following to create a .bash_profile. (You may skip if it already exists.)

touch .bash_profile

3) open -e .bash_profile

Then add the following variables.

export SPARK_VERSION=`ls /usr/local/Cellar/apache-spark/ | sort | tail -1`
export SPARK_HOME="/usr/local/Cellar/apache-spark/$SPARK_VERSION/libexec"
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH

YOU NEED TO CHANGE py4j-x.x.x-src.zip version number in LAST LINE

4) Once all these variables are assigned, save and close .bash_profile. Then the type following command to reload the file.

. .bash_profile

Upvotes: 2

Hrvoje
Hrvoje

Reputation: 15152

Just use:

import findspark
findspark.init()

import pyspark # only run after findspark.init()

If you don't have findspark module install it with:

python -m pip install findspark

Upvotes: 20

DavidWayne
DavidWayne

Reputation: 2590

You can use findspark to make spark accessible at run time. Typically findspark will find the directory where you have installed spark, but if it is installed in a non-standard location, you can point it to the correct directory. Once you have installed findspark, if spark is installed at /path/to/spark_home just put

import findspark
findspark.init('/path/to/spark_home')

at the very top of your script/notebook and you should now be able to access the pyspark module.

Upvotes: 13

Nathaniel Ford
Nathaniel Ford

Reputation: 21220

You don't have pyspark installed in a place available to the python installation you're using. To confirm this, on your command line terminal, with your virtualenv activated, enter your REPL (python) and type import pyspark:

$ python
Python 3.5.0 (default, Dec  3 2015, 09:58:14) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'pyspark'

If you see the No module name 'pyspark' ImportError you need to install that library. Quit the REPL and type:

pip install pyspark

Then re-enter the repl to confirm it works:

$ python
Python 3.5.0 (default, Dec  3 2015, 09:58:14) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
>>>

As a note, it is critical your virtual environment is activated. When in the directory of your virtual environment:

$ source bin/activate

These instructions are for a unix-based machine, and will vary for Windows.

Upvotes: 36

Related Questions