Javier Lopez Tomas
Javier Lopez Tomas

Reputation: 2362

Sagemaker lifecycle configuration for installing pandas not working

I am trying to update pandas within a lifecycle configuration, and following the example of AWS I have the next code:

#!/bin/bash

set -e

# OVERVIEW
# This script installs a single pip package in a single SageMaker conda environments.

sudo -u ec2-user -i <<EOF
# PARAMETERS
PACKAGE=pandas
ENVIRONMENT=python3
source /home/ec2-user/anaconda3/bin/activate "$ENVIRONMENT"
pip install --upgrade "$PACKAGE"==0.25.3
source /home/ec2-user/anaconda3/bin/deactivate
EOF

Then I attach it to a notebook and when I enter the notebook and open a notebook file, I see that pandas have not been updated. Using !pip show pandas I get:

Name: pandas
Version: 0.24.2
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: http://pandas.pydata.org
Author: None
Author-email: None
License: BSD
Location: /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages
Requires: pytz, python-dateutil, numpy
Required-by: sparkmagic, seaborn, odo, hdijupyterutils, autovizwidget

So we can see that I am indeed in the python3 env although the version is 0.24.

However, the log in cloudwatch shows that it has been installed:

Collecting pandas==0.25.3 Downloading https://files.pythonhosted.org/packages/52/3f/f6a428599e0d4497e1595030965b5ba455fd8ade6e977e3c819973c4b41d/pandas-0.25.3-cp36-cp36m-manylinux1_x86_64.whl (10.4MB)
2020-02-03T12:33:09.065+01:00
Requirement already satisfied, skipping upgrade: pytz>=2017.2 in ./anaconda3/lib/python3.6/site-packages (from pandas==0.25.3) (2018.4)
2020-02-03T12:33:09.065+01:00
Requirement already satisfied, skipping upgrade: python-dateutil>=2.6.1 in ./anaconda3/lib/python3.6/site-packages (from pandas==0.25.3) (2.7.3)
2020-02-03T12:33:09.065+01:00
Requirement already satisfied, skipping upgrade: numpy>=1.13.3 in ./anaconda3/lib/python3.6/site-packages (from pandas==0.25.3) (1.16.4)
2020-02-03T12:33:09.065+01:00
Requirement already satisfied, skipping upgrade: six>=1.5 in ./anaconda3/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas==0.25.3) (1.13.0)
2020-02-03T12:33:09.065+01:00
Installing collected packages: pandas Found existing installation: pandas 0.24.2 Uninstalling pandas-0.24.2: Successfully uninstalled pandas-0.24.2
2020-02-03T12:33:12.066+01:00
Successfully installed pandas-0.25.3

What could be the problem?

Upvotes: 2

Views: 2380

Answers (2)

Jing Xue
Jing Xue

Reputation: 419

I have encountered the exact same problem when package was not available in the notebook while Lifecycle Cloudwatch indicated successful installation for the specific kernel. The solution that worked for me is to make sure installation completes before opening up notebook.

Upvotes: 0

ReKx
ReKx

Reputation: 1066

if you want to install the packages only in for the python3 environment, use the following script in your Create Sagemaker Lifecycle configurations.

#!/bin/bash
sudo -u ec2-user -i <<'EOF'

# This will affect only the Jupyter kernel called "conda_python3".
source activate python3

# Replace myPackage with the name of the package you want to install.
pip install pandas==0.25.3
# You can also perform "conda install" here as well.
source deactivate
EOF

Reference : "Lifecycle Configuration Best Practices"

Upvotes: 1

Related Questions