Reputation: 2362
I am trying to update pandas within a lifecycle configuration, and following the example of AWS I have the next code:
#!/bin/bash
set -e
# OVERVIEW
# This script installs a single pip package in a single SageMaker conda environments.
sudo -u ec2-user -i <<EOF
# PARAMETERS
PACKAGE=pandas
ENVIRONMENT=python3
source /home/ec2-user/anaconda3/bin/activate "$ENVIRONMENT"
pip install --upgrade "$PACKAGE"==0.25.3
source /home/ec2-user/anaconda3/bin/deactivate
EOF
Then I attach it to a notebook and when I enter the notebook and open a notebook file, I see that pandas have not been updated. Using !pip show pandas
I get:
Name: pandas
Version: 0.24.2
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: http://pandas.pydata.org
Author: None
Author-email: None
License: BSD
Location: /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages
Requires: pytz, python-dateutil, numpy
Required-by: sparkmagic, seaborn, odo, hdijupyterutils, autovizwidget
So we can see that I am indeed in the python3 env although the version is 0.24.
However, the log in cloudwatch shows that it has been installed:
Collecting pandas==0.25.3 Downloading https://files.pythonhosted.org/packages/52/3f/f6a428599e0d4497e1595030965b5ba455fd8ade6e977e3c819973c4b41d/pandas-0.25.3-cp36-cp36m-manylinux1_x86_64.whl (10.4MB)
2020-02-03T12:33:09.065+01:00
Requirement already satisfied, skipping upgrade: pytz>=2017.2 in ./anaconda3/lib/python3.6/site-packages (from pandas==0.25.3) (2018.4)
2020-02-03T12:33:09.065+01:00
Requirement already satisfied, skipping upgrade: python-dateutil>=2.6.1 in ./anaconda3/lib/python3.6/site-packages (from pandas==0.25.3) (2.7.3)
2020-02-03T12:33:09.065+01:00
Requirement already satisfied, skipping upgrade: numpy>=1.13.3 in ./anaconda3/lib/python3.6/site-packages (from pandas==0.25.3) (1.16.4)
2020-02-03T12:33:09.065+01:00
Requirement already satisfied, skipping upgrade: six>=1.5 in ./anaconda3/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas==0.25.3) (1.13.0)
2020-02-03T12:33:09.065+01:00
Installing collected packages: pandas Found existing installation: pandas 0.24.2 Uninstalling pandas-0.24.2: Successfully uninstalled pandas-0.24.2
2020-02-03T12:33:12.066+01:00
Successfully installed pandas-0.25.3
What could be the problem?
Upvotes: 2
Views: 2380
Reputation: 419
I have encountered the exact same problem when package was not available in the notebook while Lifecycle Cloudwatch indicated successful installation for the specific kernel. The solution that worked for me is to make sure installation completes before opening up notebook.
Upvotes: 0
Reputation: 1066
if you want to install the packages only in for the python3 environment, use the following script in your Create Sagemaker Lifecycle configurations.
#!/bin/bash
sudo -u ec2-user -i <<'EOF'
# This will affect only the Jupyter kernel called "conda_python3".
source activate python3
# Replace myPackage with the name of the package you want to install.
pip install pandas==0.25.3
# You can also perform "conda install" here as well.
source deactivate
EOF
Reference : "Lifecycle Configuration Best Practices"
Upvotes: 1