How can I clean memory or use SageMaker instead to avoid MemoryError: Unable to allocate for an array with shape (25000, 2000) and data type float64

Question

I'm using keras to train a model on SageMaker, here's the code I'm using but I hit the error:

MemoryError: Unable to allocate 381. MiB for an array with shape (25000, 2000) 
    and data type float64

Here's the code:

import pandas as pd
import numpy as np
from keras.datasets import imdb
from keras import models, layers, optimizers, losses, metrics
import matplotlib.pyplot as plt

# load imbd preprocessed dataset
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(
    num_words=2000)

# one-hot encoding all the integer into a binary matrix
def vectorize_sequences(sequences, dimension=2000):
    results = np.zeros((len(sequences), dimension))        
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1.                          
    return results

x_train = vectorize_sequences(train_data)                  
x_test = vectorize_sequences(test_data)

Then I get the error.

The first time when I run this code it works but it failed when I tried to re-run it, how I can fix it by cleaning the memory or is there a way that I can use the memory on SageMaker?

Nicolas Gervais · Accepted Answer

I wouldn't know about SageMaker or AWS specifically, but something you can do is cast your input to float32, which takes less memory space. You can cast it like this:

train_data = tf.cast(train_data, tf.float32)

float32 is the default value of Tensorflow weights so you don't need float64 anyway. Proof:

import tensorflow as tf
layer = tf.keras.layers.Dense(8)
print(layer(tf.random.uniform((10, 100), 0, 1)).dtype)

My other suggestions are to get less words from your dataset, or to not one-hot encode them. If you're planning on training a recurrent model with an embedding layer, you won't need to anyway.

How can I clean memory or use SageMaker instead to avoid MemoryError: Unable to allocate for an array with shape (25000, 2000) and data type float64

Answers (1)

Related Questions