Reputation: 367
I would like to install additional python libraries when setting up AWS EMR (release 6.0.0)
I know I can do this by creating a file called boostrap.sh and uploading this file to s3 and set a bootstrap action to call this file when setting up the cluster. Contents of bootstrap.sh:
sudo pip3 install mlxtend imbalanced-learn etc etc...
However I have a separate requirements.txt file which contains the list of all my python libraries I need.
If I put 'pip3 install -r requirements.txt' into bootstrap.sh, the bootstrap.sh wont be able to find requirements.txt since I am only allowed to upload one s3 file per bootstrap action.
Is there any way around this?
Upvotes: 0
Views: 787
Reputation: 36
You can copy your requirements.txt from your S3 bucket to EMR node's local directory then run pip install on the file, e.g.
#!/bin/bash
aws s3 cp s3://<my-bucket>/requirements.txt .
sudo pip-3.6 install -r requirements.txt
Upvotes: 2