Trango
Trango

Reputation: 21

Error when import VectorAssembler in Jupyter lab - for Pyspark

I am running this import statement

from pyspark.ml.feature import VectorAssembler

And this is the full traceback:

ModuleNotFoundError                       Traceback (most recent call last)
Cell In[5], line 1
----> 1 from pyspark.ml.feature import VectorAssembler

File /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/pyspark/ml/__init__.py:22
      1 #
      2 # Licensed to the Apache Software Foundation (ASF) under one or more
      3 # contributor license agreements.  See the NOTICE file distributed with
   (...)
     15 # limitations under the License.
     16 #
     18 """
     19 DataFrame-based machine learning APIs to let users quickly assemble and configure practical
     20 machine learning pipelines.
     21 """

Upvotes: 0

Views: 39

Answers (2)

Trango
Trango

Reputation: 21

How to add Mlib library to Spark?

This solved my issue:

Try to do pip install numpy (or pip3 install numpy if that fails). The traceback says numpy module is not found.

Upvotes: 0

SeanH
SeanH

Reputation: 46

The traceback you posted states pyspark isn't in your python environment.

First, if you've had success with pyspark in jupyter lab before, make sure you're using that same kernel. screenshot of the Change Kernel menu navigation in Jupyter Lab

If that wasn't it, you can rule out a missing or corrupted pyspark installation with:

pip uninstall pyspark
pip install pyspark

If you're sure you have pyspark installed, consider installing and using findspark (pip install findspark), which automatically finds and adds pyspark to your sys.path at runtime.

import findspark
findspark.init()
from pyspark.ml.feature import VectorAssembler

If all else fails, you can make a new python environment with pyspark in it. This process isn't unique to jupyter lab, any jupyter notebook tutorial will point you in the right direction. I'll assume you're on OS X based on the /Library/Frameworks line in your traceback:

python -m venv MY_ENVIRONMENT_NAME
source MY_ENVIRONMENT_NAME/bin/activate
pip install pyspark
pip install ipykernel
python -m ipykernel install --user --name MY_KERNEL_NAME

Replace MY_ENVIRONMENT_NAME and MY_KERNEL_NAME as you like. After this, the new environment will show in your list of kernels. Select it and you're good to go.

Upvotes: 0

Related Questions