Evan Zamir
Evan Zamir

Reputation: 8481

Trying to install pandas for Pyspark running on Amazon EMR

This question could apply really to any Python packages. I have a bootstrap script that runs before my Spark jobs, and I assume that I need to install pandas in that script. I've tried many different things, but nothing seems to work (pip install, easy_install, yum install, etc). The jobs all fail when in Spark pandas is failed to be imported. I'm running EMR v5.12.1 and Python 3.4.

Upvotes: 5

Views: 8497

Answers (1)

harmands
harmands

Reputation: 1112

sudo python3 -m pip install pandas

This is what we have written in our bootstarp.sh to install pandas.

Upvotes: 7

Related Questions