Reputation: 215
spark-submit --master yarn-cluster --deploy-mode cluster test.py
end up with error
import pandas as pd ImportError: No module named pandas
this is the only error I see.
using anaconda python distribution 2.7
PYSPARK_VENV]/lib/python2.7/site-packages/
location has pandas.
Upvotes: 0
Views: 3151
Reputation: 304
Setting PYSPARK_PYTHON path should solve this:
check the pyspark path using: which pyspark
export PYSPARK_PYTHON=/pyspark/path/from/above
Upvotes: 1
Reputation: 1162
You can check whether pandas installed in [PYSPARK_VENV]/lib/python2.7/site-packages/ folder. Looks like you are executing your pyspark application on another python interpreter. Please ensure that you have installed pandas package for that interpreter.
You can use Anaconda for managing python packages in these knida situations.
Upvotes: 0