ljl97114
ljl97114

Reputation: 35

Sagemaker model evaluation

The Amazon documentation lists several approaches to evaluate a model (e.g. cross validation, etc.) however these methods does not seem to be available in the Sagemaker Java SDK. Currently if we want to do 5-fold cross validation it seems the only option is to create 5 models (and also deploy 5 endpoints) one model for each subset of data and manually compute the performance metric (recall, precision, etc.).

This approach is not very efficient and can also be expensive need to deploy k-endpoints, based on the number of folds in the k-fold validation.

Is there another way to test the performance of a model?

Upvotes: 1

Views: 1583

Answers (1)

Guy
Guy

Reputation: 12891

Amazon SageMaker is a set of multiple components that you can choose which ones to use.

The built-in algorithms are designed for (infinite) scale, which means that you can have huge datasets and be able to build a model with them quickly and with low cost. Once you have large datasets you usually don't need to use techniques such as cross-validation, and the recommendation is to have a clear split between training data and validation data. Each of these parts will be defined with an input channel when you are submitting a training job.

If you have a small amount of data and you want to train on all of it and use cross-validation to allow it, you can use a different part of the service (interactive notebook instance). You can bring your own algorithm or even container image to be used in the development, training or hosting. You can have any python code based on any machine learning library or framework, including scikit-learn, R, TensorFlow, MXNet etc. In your code, you can define cross-validation based on the training data that you copy from S3 to the worker instances.

Upvotes: 3

Related Questions