Reputation: 167
When using a p2.xlarge or p3.2xlarge with up to 1TB of memory trying to use the predefined SageMaker Image Classification algorithm in a training job I’m getting the following error:
ClientError: Out of Memory. Please use a larger instance and/or reduce the values of other parameters (e.g. batch size, number of layers etc.) if applicable
I’m using 450+ images, I’ve tried resizing them from their original 2000x3000px size to a 244x244px size down to a 24x24px size and keep getting the same error.
I’ve tried adjusting my hyper parameters: num_classes, num_layers, num_training_samples, optimizer, image_shape, checkpoint frequency, batch_size and epochs. Also tried using pretrained model. But the same error keeps occurring.
Upvotes: 1
Views: 4365
Reputation: 1875
Would've added this as a comment but I don't have enough rep yet.
A few clarifying questions so that I can have some more context:
How exactly are you achieving 1TB of RAM?
p2.xlarge
servers have 61GB of RAM, and p3.2xlarge
servers have 61GB memory + 16GB onboard the Tesla V100 GPU. How are you storing, resizing, and ingesting the images into the SageMaker algorithm?
Upvotes: 2