jiashenC
jiashenC

Reputation: 1932

AWS SageMaker PyTorch: no module named 'sagemaker'

I have deployed a PyTorch model on AWS with SageMaker, and I try to send a request to test the service. However, I got a very vague error message saying "no module named 'sagemaker'". I have tried to search online, but cannot find posts about similar message.

My client code:

import numpy as np
from sagemaker.pytorch.model import PyTorchPredictor

ENDPOINT = '<endpoint name>'

predictor = PyTorchPredictor(ENDPOINT)
predictor.predict(np.random.random_sample([1, 3, 224, 224]).tobytes())

Detailed error message:

Traceback (most recent call last):
  File "client.py", line 7, in <module>
    predictor.predict(np.random.random_sample([1, 3, 224, 224]).tobytes())
  File "/Users/jiashenc/Env/py3/lib/python3.7/site-packages/sagemaker/predictor.py", line 110, in predict
    response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
  File "/Users/jiashenc/Env/py3/lib/python3.7/site-packages/botocore/client.py", line 276, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/jiashenc/Env/py3/lib/python3.7/site-packages/botocore/client.py", line 586, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "No module named 'sagemaker'". See https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=/aws/sagemaker/Endpoints/<endpoint name> in account xxxxxxxxxxxxxx for more information.

This bug is because I merge both the serving script and my deploy script together, see below

import os
import torch
import numpy as np
from sagemaker.pytorch.model import PyTorchModel
from torch import cuda
from torchvision.models import resnet50


def model_fn(model_dir):
    device = torch.device('cuda' if cuda.is_available() else 'cpu')
    model = resnet50()
    with open(os.path.join(model_dir, 'model.pth'), 'rb') as f:
        model.load_state_dict(torch.load(f, map_location=device))
    return model.to(device)

def predict_fn(input_data, model):
    device = torch.device('cuda' if cuda.is_available() else 'cpu')
    model.eval()
    with torch.no_grad():
        return model(input_data.to(device))


if __name__ == '__main__':
    pytorch_model = PyTorchModel(model_data='s3://<bucket name>/resnet50/model.tar.gz',
                                    entry_point='serve.py', role='jiashenC-sagemaker',
                                    py_version='py3', framework_version='1.3.1')
    predictor = pytorch_model.deploy(instance_type='ml.t2.medium', initial_instance_count=1)
    print(predictor.predict(np.random.random_sample([1, 3, 224, 224]).astype(np.float32)))

The root cause is the 4th line in my code. It tries to import sagemaker, which is an unavailable library.

Upvotes: 1

Views: 8347

Answers (2)

Jayanth MKV
Jayanth MKV

Reputation: 313

The execution environment doesn't contain 'sagemaker' installed, so it can be explicitly added as a .zip file.

Your AWS Lambda function’s code comprises a .py file containing your function’s handler code, together with any additional packages and modules your code depends on. To deploy this function code to Lambda, you use a deployment package. This package may either be a .zip file archive or a container image.

You can follow this as well : Docs

  1. To create your deployment package as .zip file archive, you can use your command-line tool. cmd1 cmd2 cmd cmd5

  2. Copy the zip file to s3 bucket s3_add_cmd

  3. Add Layer : A layer is a separate .zip file that can contain additional code and other content. The Lambda Python runtimes includes the AWS SDK for Python (Boto3) and its dependencies. Lambda provides the SDK in the runtime for deployment scenarios where you are unable to add your own dependencies.

layers

  1. Upload the zip file to s3 bucket and add the bucket URI and create the layer. Remember you can also upload the zip file from local machine as well.

zip file

  1. Add the layer to the lambda function enter image description here scroll down to see this 👇 enter image description here

enter image description here click on save and the layer is added

Upvotes: 0

Olivier Cruchant
Olivier Cruchant

Reputation: 4037

(edit 2/9/2020 with extra code snippets)

Your serving code tries to use the sagemaker module internally. The sagemaker module (also called SageMaker Python SDK, one of the numerous orchestration SDKs for SageMaker) is not designed to be used in model containers, but instead out of models, to orchestrate their activity (train, deploy, bayesian tuning, etc). In your specific example, you shouldn't include the deployment and model call code to server code, as those are actually actions that will be conducted from outside the server to orchestrate its lifecyle and interact with it. For model deployment with the Sagemaker Pytorch container, your entry point script just needs to contain the required model_fn function for model deserialization, and optionally an input_fn, predict_fn and output_fn, respectively for pre-processing, inference and post-processing (detailed in the documentation here). This logic is beautiful :) : you don't need anything else to deploy a production-ready deep learning server! (MMS in the case of Pytorch and MXNet, Flask+Gunicorn in the case of sklearn).

In summary, this is how your code should be split:

An entry_point script serve.py that contains model serving code and looks like this:

import os

import numpy as np
import torch
from torch import cuda
from torchvision.models import resnet50

def model_fn(model_dir):
    # TODO instantiate a model from its artifact stored in model_dir
    return model

def predict_fn(input_data, model):
    # TODO apply model to the input_data, return result of interest
    return result

and some orchestration code to instantiate a SageMaker Model object, deploy it to a server and query it. This is run from the orchestration runtime of your choice, which could be a SageMaker Notebook, your laptop, an AWS Lambda function, an Apache Airflow operator, etc - and with the SDK for your choice; don't need to use python for this.

import numpy as np
from sagemaker.pytorch.model import PyTorchModel

pytorch_model = PyTorchModel(
    model_data='s3://<bucket name>/resnet50/model.tar.gz',
    entry_point='serve.py',
    role='jiashenC-sagemaker',
    py_version='py3',
    framework_version='1.3.1')

predictor = pytorch_model.deploy(instance_type='ml.t2.medium', initial_instance_count=1)

print(predictor.predict(np.random.random_sample([1, 3, 224, 224]).astype(np.float32)))

Upvotes: 4

Related Questions