David293836
David293836

Reputation: 1215

AWS SageMaker Minimum Configuration

Why do I need Container for AWS SageMaker? If I want to run Scikit Learn on SageMaker's Jupyter notebook for self learning purposes, do I still need to configure Container for it?

What is the minimum configuration on SageMaker I will need if I just want to learn Scikit Learn? For example, I want to run Scikit Learn's Decision Tree algorithm with a set of training data and a set of test data. What do I need to do on SageMaker to perform the tasks? Thanks.

Upvotes: 9

Views: 951

Answers (2)

Guy C
Guy C

Reputation: 7270

If you are not concerned about using Sagemaker's training and deployment features then you just need to create a new conda_python3 notebook and import sklearn.

I too was confused about how to take advantage of Sagemaker's train/deploy features with Scikit Learn. The best explanation and most up to date seems to be:

https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/sklearn/README.rst

The brief summary is:

  1. You save your training data to an S3 bucket.
  2. Create a standalone python script that does your training, serializes the training model to a file and saves it to an S3 bucket.
  3. In a notebook on Sagemaker you import the Sagemaker SDK and point it to your training script and data. Sagemaker will then temporarily create an AWS instance to train the model.
  4. Once trained that instance gets automatically destroyed.
  5. Finally you use the Sagemaker SDK to deploy the trained model to another AWS instance. This also automatically creates an endpoint that can be called to make predictions.

Upvotes: 2

Pablo
Pablo

Reputation: 131

You don't need much. Just an AWS Account with the correlated permissions on your role. Inside the AWS SageMaker Console you can just run an AWS Notebook Instance with one click. There is Sklearn preinstalled and you can use it out of the box. No special container needed.

As minimum you just need your AWS Account with the correlated permissions to create EC2 Instances and read / write from your S3. Thats all, just try it. :)

Use this as a starting point: Amazon SageMaker – Accelerating Machine Learning

You can also access it via the Jupyter Terminal

Upvotes: 5

Related Questions