Reputation: 2791
I have a Sagemaker Jupyter notebook instance that I keep leaving online overnight by mistake, unnecessarily costing money...
Is there any way to automatically stop the Sagemaker notebook instance when there is no activity for say, 1 hour? Or would I have to make a custom script?
Upvotes: 25
Views: 29763
Reputation: 11
Finally Amazon SageMaker Studio now supports automatic shutdown of idle applications, no need to manage It you just need to enable It at The domain Level or Profile Level. It's only available for image version 2.0 or newer.
Upvotes: 0
Reputation: 995
You can use this code:
#!/bin/bash
set -ex
# OVERVIEW
# This script stops a SageMaker notebook once it's idle for more than 1 hour (default time)
# You can change the idle time for stop using the environment variable below.
# If you want the notebook the stop only if no browsers are open, remove the --ignore-connections flag
#
# Note that this script will fail if either condition is not met
# 1. Ensure the Notebook Instance has internet connectivity to fetch the example config
# 2. Ensure the Notebook Instance execution role permissions to SageMaker:StopNotebookInstance to stop the notebook
# and SageMaker:DescribeNotebookInstance to describe the notebook.
#
# PARAMETERS
IDLE_TIME=3600
echo "Fetching the autostop script"
wget https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/master/scripts/auto-stop-idle/autostop.py
echo "Detecting Python install with boto3 install"
# Find which install has boto3 and use that to run the cron command. So will use default when available
# Redirect stderr as it is unneeded
CONDA_PYTHON_DIR=$(source /home/ec2-user/anaconda3/bin/activate /home/ec2-user/anaconda3/envs/JupyterSystemEnv && which python)
if $CONDA_PYTHON_DIR -c "import boto3" 2>/dev/null; then
PYTHON_DIR=$CONDA_PYTHON_DIR
elif /usr/bin/python -c "import boto3" 2>/dev/null; then
PYTHON_DIR='/usr/bin/python'
else
# If no boto3 just quit because the script won't work
echo "No boto3 found in Python or Python3. Exiting..."
exit 1
fi
echo "Found boto3 at $PYTHON_DIR"
echo "Starting the SageMaker autostop script in cron"
(crontab -l 2>/dev/null; echo "*/5 * * * * $PYTHON_DIR $PWD/autostop.py --time $IDLE_TIME --ignore-connections >> /var/log/jupyter.log") | crontab -
Upvotes: 0
Reputation: 11
SageMaker Studio Notebook Kernels can be terminated by attaching the following lifecycle configuration script to the domain.
#!/bin/bash
# This script installs the idle notebook auto-checker server extension to SageMaker Studio
# The original extension has a lab extension part where users can set the idle timeout via a Jupyter Lab widget.
# In this version the script installs the server side of the extension only. The idle timeout
# can be set via a command-line script which will be also created by this create and places into the
# user's home folder
#
# Installing the server side extension does not require Internet connection (as all the dependencies are stored in the
# install tarball) and can be done via VPCOnly mode.
set -eux
# timeout in minutes
export TIMEOUT_IN_MINS=120
# Should already be running in user home directory, but just to check:
cd /home/sagemaker-user
# By working in a directory starting with ".", we won't clutter up users' Jupyter file tree views
mkdir -p .auto-shutdown
# Create the command-line script for setting the idle timeout
cat > .auto-shutdown/set-time-interval.sh << EOF
#!/opt/conda/bin/python
import json
import requests
TIMEOUT=${TIMEOUT_IN_MINS}
session = requests.Session()
# Getting the xsrf token first from Jupyter Server
response = session.get("http://localhost:8888/jupyter/default/tree")
# calls the idle_checker extension's interface to set the timeout value
response = session.post("http://localhost:8888/jupyter/default/sagemaker-studio-autoshutdown/idle_checker",
json={"idle_time": TIMEOUT, "keep_terminals": False},
params={"_xsrf": response.headers['Set-Cookie'].split(";")[0].split("=")[1]})
if response.status_code == 200:
print("Succeeded, idle timeout set to {} minutes".format(TIMEOUT))
else:
print("Error!")
print(response.status_code)
EOF
chmod +x .auto-shutdown/set-time-interval.sh
# "wget" is not part of the base Jupyter Server image, you need to install it first if needed to download the tarball
sudo yum install -y wget
# You can download the tarball from GitHub or alternatively, if you're using VPCOnly mode, you can host on S3
wget -O .auto-shutdown/extension.tar.gz https://github.com/aws-samples/sagemaker-studio-auto-shutdown-extension/raw/main/sagemaker_studio_autoshutdown-0.1.5.tar.gz
# Or instead, could serve the tarball from an S3 bucket in which case "wget" would not be needed:
# aws s3 --endpoint-url [S3 Interface Endpoint] cp s3://[tarball location] .auto-shutdown/extension.tar.gz
# Installs the extension
cd .auto-shutdown
tar xzf extension.tar.gz
cd sagemaker_studio_autoshutdown-0.1.5
# Activate studio environment just for installing extension
export AWS_SAGEMAKER_JUPYTERSERVER_IMAGE="${AWS_SAGEMAKER_JUPYTERSERVER_IMAGE:-'jupyter-server'}"
if [ "$AWS_SAGEMAKER_JUPYTERSERVER_IMAGE" = "jupyter-server-3" ] ; then
eval "$(conda shell.bash hook)"
conda activate studio
fi;
pip install --no-dependencies --no-build-isolation -e .
jupyter serverextension enable --py sagemaker_studio_autoshutdown
if [ "$AWS_SAGEMAKER_JUPYTERSERVER_IMAGE" = "jupyter-server-3" ] ; then
conda deactivate
fi;
# Restarts the jupyter server
nohup supervisorctl -c /etc/supervisor/conf.d/supervisord.conf restart jupyterlabserver
# Waiting for 30 seconds to make sure the Jupyter Server is up and running
sleep 30
# Calling the script to set the idle-timeout and active the extension
/home/sagemaker-user/.auto-shutdown/set-time-interval.sh
Resource
Upvotes: 0
Reputation: 301
After we've burned quite a lot of money by forgetting to turn off these machines, I've decided to create a script. It's based on AWS' script, but provides an explanation why the machine was or was not killed. It's pretty lightweight because it does not use any additional infrastructure like Lambda.
Here is the script and the guide on installing it! It's just a simple lifecycle configuration!
Upvotes: 5
Reputation: 23002
You can use Lifecycle configurations to set up an automatic job that will stop your instance after inactivity.
There's a GitHub repository which has samples that you can use. In the repository, there's a auto-stop-idle script which will shutdown your instance once it's idle for more than 1 hour.
What you need to do is
If you think 1 hour is too long you can tweak the script. This line has the value.
Upvotes: 35
Reputation: 393
You could also use CloudWatch + Lambda to monitor Sagemaker and stop when your utilization hits a minimum. Here is a list of what's available in CW for SM: https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html.
For example, you could set a CW alarm to trigger when CPU utilization falls below ~5% for 30 minutes and have that trigger a Lambda which would shut down the notebook.
Upvotes: 4
Reputation: 222
Unfortunately, automatically stopping the Notebook Instance when there is no activity is not possible in SageMaker today. To avoid leaving them overnight, you can write a cron job to check if there's any running Notebook Instance at night and stop them if needed.
Upvotes: 1