mr-sk
mr-sk

Reputation: 13417

Can't get pip install to work on EMR cluster

I have an EMR (emr-5.30.0) cluster I'm trying to start with a bootstrap file in S3. The contents of the bootstrap file are:

#!/bin/bash
sudo pip3 install --user \
     matplotlib \
     pandas \
     pyarrow \
     pyspark

And the error in my stderr file is:

WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Command "python setup.py egg_info" failed with error code 1 in /mnt/tmp/pip-build-br9bn1h3/pyspark/

Seems pretty simple...no idea what is going on. Any help is appreciated.

EDIT:

Tried @Dennis Traub suggestion and get same error. New EMR bootstrap looks like this:

#!/bin/bash
sudo pip3 install --upgrade setuptools
sudo pip3 install --user matplotlib pandas pyarrow pyspark

Upvotes: 3

Views: 4017

Answers (2)

SnigJi
SnigJi

Reputation: 1410

#!/bin/bash

sudo python3 -m pip install matplotlib pandas pyarrow

DO NOT install pyspark. It should be already there in EMR with required config. Installing may cause problems.

Upvotes: 10

Dennis Traub
Dennis Traub

Reputation: 51684

You might have an outdated version of setuptools. Try the following script:

#!/bin/bash
sudo pip3 install --upgrade setuptools
sudo pip3 install --user matplotlib pandas pyarrow pyspark

Upvotes: 0

Related Questions