Narendra Pinnaka
Narendra Pinnaka

Reputation: 215

spark-submit python file and getting No module Found

 spark-submit --master yarn-cluster --deploy-mode cluster test.py

end up with error

import pandas as pd ImportError: No module named pandas

this is the only error I see.

using anaconda python distribution 2.7 PYSPARK_VENV]/lib/python2.7/site-packages/ location has pandas.

Upvotes: 0

Views: 3151

Answers (2)

Sincole Brans
Sincole Brans

Reputation: 304

Setting PYSPARK_PYTHON path should solve this:

check the pyspark path using: which pyspark

export PYSPARK_PYTHON=/pyspark/path/from/above

Upvotes: 1

james.bondu
james.bondu

Reputation: 1162

You can check whether pandas installed in [PYSPARK_VENV]/lib/python2.7/site-packages/ folder. Looks like you are executing your pyspark application on another python interpreter. Please ensure that you have installed pandas package for that interpreter.

You can use Anaconda for managing python packages in these knida situations.

Upvotes: 0

Related Questions