Reputation: 22356

SageMaker - clarification on SageMaker entities in CloudFormation

Question

Would like to clarify the entities in AWS::SageMaker.

SageMaker Model

When looked at the diagram in the Deploy a Model on Amazon SageMaker Hosting Services, the Model artifacts in SageMaker is the data generated by a ML algorithm docker container in the Model training phase, and stored in a S3 bucket.

However, AWS::SageMaker::Model seems to have captured a docker image to run the inference code in a SageMaker endpoint instance. There is no reference to the model data in a S3 bucket. Hence wonder why it is called AWS::SageMaker::Model and why not called such as AWS::SageMaker::InferenceImage.

1-1. What is Model in AWS SageMaker?

1-2. Is it a docker image (algorithm) to do the prediction/inference, not the data to run the algorithm on?

1-3. Does AWS call the runtime (docker runtime + docker image for inference) as Model?

AWS::SageMaker::Model

Type: AWS::SageMaker::Model
Properties: 
  Containers: 
    - ContainerDefinition
  ExecutionRoleArn: String
  ModelName: String
  PrimaryContainer: 
    ContainerDefinition
  Tags: 
    - Tag
  VpcConfig: 
    VpcConfig

SageMaker Endpoint or SageMaker Estimator from a model data in S3

The SageMaker Estimator has an argument output_path as in Python SDK Estimators.

S3 location for saving the training result (model artifacts and output files). If not specified, results are stored to a default bucket. If the bucket with the specific name does not exist, the estimator creates the bucket during the fit() method execution.

For Python ML environment, we can use pickle to export the data and reload back into a model as in 3.4. Model persistence. We can do the same for Spark ML.

2-1. What is the equivalent in SageMaker as AWS::SageMaker::Model has no argument to refer to a data in a S3 bucket?

2-2. Can SageMaker Estimator be re-created using the model data in S3 bucket?

SageMaker Estimator

I thought there would be a resource to define a SageMaker Estimator in CloudFormation, but looks there is none.

3-1. Please help understand if there is a reason.

Upvotes: 1

Answers (2)

Tobias Senst

Reputation: 2840

I will try to provide you a little more context. Maybe what causes a little bit confusion was that you mixed up the Sagemaker Python SDK with the actual Sagemaker API. Since the SDK implements the API calls constructs from the SDK are not reflected in the API:

1-1: A model in sagemaker is referencing to the model artifact and the inference image and both have to be given with the container specification of a model. See CreateModel API

"Containers": [ 
      { 
         ...
         "Image": "string",
         "ModelDataUrl": "string",
         "ModelPackageName": "string",
         ...
      }
   ],

There is also the ModelDataUrl is the s3 location of you packed model artificat e.g. model.tar.gz and Image gives the location of your image containing the inference code. If you use the [CreateTrainingJob][4] everything what you store in your local /opt/ml/output/ folder will be compressed and stored to the S3OutputPath you set up. Another way to specify the model artifact and image is that you register the model to the Sagemaker model registry and by creating the model you have to provide the model package arn in the field ModelPackageName. But than the model package contains the image and model artifact. (Note either Image and ModelDataUrl or only ModelPackageName can be used to create a model. To create a Sagemaker Model with CDK use this class https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_sagemaker.CfnModel.html.

1-2: The docker image for prediction has to implement a Rest-Service that is able to handle the requests /ping for liveness checks and /invocations which contains the payload. To setup such a service you can use e.g. FAST-API our the Multi Model Server from amazon. See Use your own inference code. You may also have the training and inference code in one image. To indicate that the image is run in inference mode Sagemaker calls the python script with the argument serve.

2-1. Everything that is stored during the training job in the local directory /opt/ml/output/ is compressed and stores as model artifact model.tar.gz in the s3 output path you provide. It is up to you if you store additional files there to. If the Model is deployed by an Sagemaker Endpoint. The docker image will contain a /opt/ml/model directory where the decompressed content of the model artifact is being copied. For the inference code you can load the model from there.

2-2. I have not seen that a Sagemaker Estimator has been recreated with an existing model artifact. What you looking for is maybe incremental training. But if you need to do a initialization from the scratch the best way would be to provide the existing model artifact as input to your training or use checkpoints. Both allows you to load the model during training and continue on that.

3-1. Estimator belongs to the Sagemaker Python SDK.The SDK contains classes like the Estimator that uses the Sagemaker API. For example the Estimator class uses CreateTrainingJob to setup and start a model training. Cloudformation only contains constructs from the API and not from the SDK. Unfortunatly CDK is also not complete. E.g. at the moment it does not support the CreateModelPackage API. But I guess in the future.

Upvotes: 0

Thom Lane

Reputation: 1063

Clearing up a few concept to begin with: an Amazon SageMaker Model is a reference to the model artifacts (i.e. trained model), the associated inference environment (i.e. Docker container) and the inference source code. An Estimator is used to train the model and outputs the Model Data (i.e. model.tar.gz) used by a Model. And a Model doesn't reference training code (so an Estimator cannot be constructed from a Model) and it doesn't reference any inference data either: that is passed to an Endpoint or Batch Transform.

Solving the majority of your issues: you can specify ModelDataUrl on a ContainerDefinition for AWS::SageMaker::Model. You would typically reference the Amazon S3 path to the model.tar.gz which was output from the Amazon SageMaker training job.