Bily
Bily

Reputation: 51

MemoryError when calling to_categorical in keras

I try to run the language modeling program. When I use the data train with 15000 sentences in a document, the program running properly. But, when I try to change the data with the bigger one (10 times bigger) it's encountered an error as below:

Traceback (most recent call last):

  File "<ipython-input-2-aa5ef9098286>", line 1, in <module>
    runfile('C:/Users/cerdas/Documents/Bil/Lat/lstm-plato-lm/platolm.py', wdir='C:/Users/cerdas/Documents/Bil/Lat/lstm-plato-lm')

  File "C:\Users\cerdas\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "C:\Users\cerdas\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/cerdas/Documents/Bil/Lat/lstm-plato-lm/platolm.py", line 35, in <module>
    y = to_categorical(y, num_classes=vocab_size)

  File "C:\Users\cerdas\Anaconda3\lib\site-packages\keras\utils\np_utils.py", line 30, in to_categorical
    categorical = np.zeros((n, num_classes), dtype=np.float32)

MemoryError

here is the suspected line of error code:

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

and also the np.utils

categorical = np.zeros((n, num_classes), dtype=np.float64)

i've trying to search the solution for similar problem, i found that i have to change categorical_crossentropy to sparse_categorical_crossentropy. I have do that but it's still error with the same traceback.

Thanks

Upvotes: 1

Views: 1978

Answers (2)

Dr. Snoopy
Dr. Snoopy

Reputation: 56377

If you switch to sparse categorical cross-entropy loss, then you don't need to to_categorical call, which is actually the one that is giving an error. Sparse categorical cross-entropy should work for this.

Upvotes: 2

pitfall
pitfall

Reputation: 2621

I think this error is expected. The real issue here is that you don't have enough space to allocate 1) the parameter matrix of the decision layer, and/or 2) the intermediate tensor.

The parameter matrix has the shape of input_feat_dim x output_num_classes. As you can see, this matrix will consume a huge amount of memory when the vocabulary is large. To train a network, we also need to keep intermediate tensors for BP, which will be even bigger -- batch_size x input_feat_dim x output_num_classes.

So one thing you can try very quick is to reduce your batch_size to 1/10. Of course, you can't set your batch size too small. In this case, you may want to accumulate gradients until seeing enough samples.

Upvotes: 0

Related Questions