Zach
Zach

Reputation: 41

How do I deploy a ML model trained on SageMaker, to a local machine to run predict?

I been looking around at various post about deploying SageMaker models locally, but they have to be tied to an AWS notebook instances in order to run predict/serve locally (AWS SageMaker Python SDK). This defeats the actual intent of running the Sagemaker trained model fully offline. Also there are some others who tried unpickling the tar.gz file on S3, followed by wrapping the contents to be deployed locally. However the process seems to be very restricted to certain types of models such as XGBoost and MXnet. Hence is there any way to deploy a SageMaker trained model offline without dependency to a Sagemaker notebook instance? Any form of advice would be appreciated. Thank you.

Upvotes: 4

Views: 6367

Answers (2)

Derek Haynes
Derek Haynes

Reputation: 33

I've deployed PyTorch models locally via Amazon SageMaker Local Mode. I believe the same process works for other ML frameworks that have official SageMaker containers. You can run the same Docker containers locally that SageMaker uses when deploying your model on AWS infrastructure.

The docs for deploying a Sagemaker endpoint locally for inference are a bit scattered. A summary:

  1. Use local versions of API clients: normally, you use botocore.client.SageMaker and botocore.client.SageMakerRuntime classes to use SageMaker from Python. To use SageMaker locally, use sagemaker.local.LocalSagemakerClient() and sagemaker.local.LocalSagemakerRuntimeClient() instead.
  2. You can use a local tar.gz model file if you wish.
  3. Set the instance_type to local when deploying the model.

I wrote How to setup a local AWS SageMaker environment for PyTorch, which goes in detail on how this works.

Upvotes: 1

Gili Nachum
Gili Nachum

Reputation: 5578

Once you have trained a model using Amazon SageMaker you'll have a Model entry. The model will point to a model artifact in S3. This tag.gz file has the model weights. The format of the file depends on the framework (tensorflow/pytorch/mxnet/...) you've used to train the model. If you've used SageMaker built-in algorithms, most of them are implemented with MXNet, or XGBoost, so you could use the relevant model serving software to run the model.
If you need serving software, you could run the SageMaker deeplearning containers in inference mode, on your local inference server. Or use open-source serving software like TFServing, or load the model in-memory.

Upvotes: 0

Related Questions