Reputation: 331
This is the exact code from a tutorial I'm following. My classmate didn't get this error with the same code:
ImportError Traceback (most recent call last)
<ipython-input-1-c6e1bed850ab> in <module>()
----> 1 from pyspark import SparkContext
2 sc = SparkContext('local', 'Exam_3')
3
4 from pyspark.sql import SQLContext
5 sqlContext = SQLContext(sc)
ImportError: No module named pyspark
This is the code:
from pyspark import SparkContext
sc = SparkContext('local', 'Exam_3')
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
data = sc.textFile("exam3")
parsedData = data.map(lambda line: [float(x) for x in line.split(',')])
retail = sqlContext.createDataFrame(parsedData,
['category_name','product_id', 'product_name', 'product_price'])
retail.registerTempTable("exam3")
print parsedData.take(3)
Upvotes: 32
Views: 214347
Reputation: 70
Just in case if you are coming here while using the delta-spark module check the version compatibility of spark and delta-spark packages. Make sure that the versions of PySpark and the Delta Lake library you are using are compatible.
eg:
pip install pyspark==3.3.0
pip install delta-spark==2.3.0
spark and delta-spark compatability
Upvotes: 0
Reputation: 11
I meet this error after
conda install pyspark
then i
pip install pyspark
this install py4j automatically, then comes right
Upvotes: 1
Reputation: 81
my solution to this problem was to
$jupyter-lab
$pip install pyspark
my output is:
Collecting pyspark
Using cached pyspark-3.2.0.tar.gz (281.3 MB)
Preparing metadata (setup.py) ... done
Collecting py4j==0.10.9.2
Using cached py4j-0.10.9.2-py2.py3-none-any.whl (198 kB)
Building wheels for collected packages: pyspark
Building wheel for pyspark (setup.py) ... done
Created wheel for pyspark: filename=pyspark-3.2.0-py2.py3-none-any.whl size=281805913 sha256=26e539058858454dbbb48158111968d67e663c7b53e64c4fd91e38d92ac1cd80
Stored in directory: /Users/user/Library/Caches/pip/wheels/2f/f8/95/2ad14a4614b4a9f645ee928fbbd057b1b254c67adb494c9a58
Successfully built pyspark Installing collected packages: py4j, pyspark
Successfully installed py4j-0.10.9.2 pyspark-3.2.0
Note: you may need to restart the kernel to use updated packages.
$import pyspark
You may as well want to try running the pip command directly in the lab env.
Upvotes: 0
Reputation: 19
import findspark findspark.init()
Traceback (most recent call last): File "", line 1, in ImportError: No module named 'findspark'
$ pip install findspark
it will work
Upvotes: 0
Reputation: 1816
Make sure first to install pyspark using conda :
conda install pyspark
Upvotes: 0
Reputation: 1074
Here is the latest solution that is worked for me FOR MAC users only. I've installed pyspark through pip install pyspark
. But, it didn't work when I execute pyspark
in terminal or even in python import pyspark. I checked that pyspark already installed in my laptop.
At the end, I found the solution. You just need to add into the bash profile file.
Follow steps:
1) Type the following in a terminal window to go to your home folder.
cd ~
2) Then the following to create a .bash_profile. (You may skip if it already exists.)
touch .bash_profile
3) open -e .bash_profile
Then add the following variables.
export SPARK_VERSION=`ls /usr/local/Cellar/apache-spark/ | sort | tail -1`
export SPARK_HOME="/usr/local/Cellar/apache-spark/$SPARK_VERSION/libexec"
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH
YOU NEED TO CHANGE py4j-x.x.x-src.zip version number in LAST LINE
4) Once all these variables are assigned, save and close .bash_profile. Then the type following command to reload the file.
. .bash_profile
Upvotes: 2
Reputation: 15152
Just use:
import findspark
findspark.init()
import pyspark # only run after findspark.init()
If you don't have findspark module install it with:
python -m pip install findspark
Upvotes: 20
Reputation: 2590
You can use findspark
to make spark accessible at run time. Typically findspark
will find the directory where you have installed spark, but if it is installed in a non-standard location, you can point it to the correct directory. Once you have installed findspark
, if spark is installed at /path/to/spark_home
just put
import findspark
findspark.init('/path/to/spark_home')
at the very top of your script/notebook and you should now be able to access the pyspark module.
Upvotes: 13
Reputation: 21220
You don't have pyspark
installed in a place available to the python installation you're using. To confirm this, on your command line terminal, with your virtualenv
activated, enter your REPL (python
) and type import pyspark
:
$ python
Python 3.5.0 (default, Dec 3 2015, 09:58:14)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'pyspark'
If you see the No module name 'pyspark'
ImportError you need to install that library. Quit the REPL and type:
pip install pyspark
Then re-enter the repl to confirm it works:
$ python
Python 3.5.0 (default, Dec 3 2015, 09:58:14)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
>>>
As a note, it is critical your virtual environment is activated. When in the directory of your virtual environment:
$ source bin/activate
These instructions are for a unix-based machine, and will vary for Windows.
Upvotes: 36