user_5
user_5

Reputation: 576

Error in deploying PyTorch model using SageMaker Pipeline and RegisterModel

Can anyone provide an example for deploying a pytorch model using SageMaker Pipeline?

I've used the MLOps template (MLOps template for model building, traing and deployment) of SageMaker Studio to build a MLOps project.

The template is using sagemaker pipelines to build a pipeline for preprocessing and training and registering the model. And deployment script is implemented in the YAML file and employing CloudFormation to run. The deployment script will be triggered automatically when the model is registered.

The template is using xgboost model to train the data and deploy the model. I want to use Pytorch and deploy it. I successfully replaced the pytorch with xgboost and successfully preprocessed the data, trained the model and registered the model. But I didn't use inference.py in my model. So I get error for the model deployment.

The error log in updating the endpoint is:

FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/model/code/inference.py'

I tried to find example of using inference.py for pytorch model, but I couldn't find any example which uses sagemaker pipelines and RegisterModel.

Any help would be appreciated.

Below you can see a part of the pipeline for training and registering the model.

from sagemaker.pytorch.estimator import PyTorch
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import (
    ProcessingStep,
    TrainingStep,
)
from sagemaker.workflow.step_collections import RegisterModel

pytorch_estimator = PyTorch(entry_point= os.path.join(BASE_DIR, 'train.py'),
                            instance_type= "ml.m5.xlarge",
                            instance_count=1,
                            role=role,
                            framework_version='1.8.0',
                            py_version='py3',
                            hyperparameters = {'epochs': 5, 'batch-size': 64, 'learning-rate': 0.1})

step_train = TrainingStep(
        name="TrainModel",
        estimator=pytorch_estimator,

        inputs={
                "train": sagemaker.TrainingInput(
                            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                            "train_data"
                            ].S3Output.S3Uri,
                            content_type="text/csv",
                        ),
                "dev": sagemaker.TrainingInput(
                            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                            "dev_data"
                            ].S3Output.S3Uri,
                            content_type="text/csv"
                        ),
                "test": sagemaker.TrainingInput(
                            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                            "test_data"
                            ].S3Output.S3Uri,
                            content_type="text/csv"
                        ),
        },
)
step_register = RegisterModel(
            name="RegisterModel",
            estimator=pytorch_estimator,
            model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
            content_types=["text/csv"],
            response_types=["text/csv"],
            inference_instances=["ml.t2.medium", "ml.m5.large"],
            transform_instances=["ml.m5.large"],
            model_package_group_name=model_package_group_name,
            approval_status=model_approval_status,
        )
    
pipeline = Pipeline(
            name=pipeline_name,
            parameters=[
                processing_instance_type,
                processing_instance_count,
                training_instance_type,
                model_approval_status,
                input_data,
            ],
            steps=[step_process, step_train, step_register],
            sagemaker_session=sagemaker_session,
        )

Upvotes: 1

Views: 769

Answers (2)

Cris Pineda
Cris Pineda

Reputation: 31

I had a similar issue but I think it has something to do with the PyTorch container on how its deployed for training and inference.

I followed this code here. Basically you need to recreate the mode with PyTorchModel instead of using the PyTorch one.

Here is the snippet:

model = PyTorchModel(
    entry_point="infer.py",
    source_dir=os.path.join(BASE_DIR, "sagemaker_intel"),
    image_uri = "441249477288.dkr.ecr.ap-south-1.amazonaws.com/sagemaker-inference",
    sagemaker_session=pipeline_session,
    role=role,
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    framework_version="1.11.0",
)

model_step_args = model.register(
    content_types=["application/x-npy"],
    response_types=["application/json"],
    inference_instances=["ml.t2.medium", "ml.t2.large"],
    transform_instances=["ml.m4.xlarge"],
    model_package_group_name=model_package_group_name,
    approval_status=model_approval_status,
    model_metrics=model_metrics,
)

step_register = ModelStep(
    name="RegisterIntelClassifierModel",
    step_args=model_step_args,
)

Let me know if this works.

Upvotes: 0

이규민
이규민

Reputation: 1

PyTorch api is using base pytorch images. when sagemaker.pytorch.deploy method called, the sagemaker run '/opt/ml/model/code/inference.py'

But in your base image is not have that file.

So if you want to use deploy method, you make 'inference.py' with sagemaker style(can execute in sagemaker container) and build and push image.

And then you can use deploy method!

Here is sample codes https://sagemaker-workshop.com/custom/containers.html

Upvotes: 0

Related Questions