Reputation: 51
I try to run the language modeling program. When I use the data train with 15000 sentences in a document, the program running properly. But, when I try to change the data with the bigger one (10 times bigger) it's encountered an error as below:
Traceback (most recent call last):
File "<ipython-input-2-aa5ef9098286>", line 1, in <module>
runfile('C:/Users/cerdas/Documents/Bil/Lat/lstm-plato-lm/platolm.py', wdir='C:/Users/cerdas/Documents/Bil/Lat/lstm-plato-lm')
File "C:\Users\cerdas\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "C:\Users\cerdas\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/cerdas/Documents/Bil/Lat/lstm-plato-lm/platolm.py", line 35, in <module>
y = to_categorical(y, num_classes=vocab_size)
File "C:\Users\cerdas\Anaconda3\lib\site-packages\keras\utils\np_utils.py", line 30, in to_categorical
categorical = np.zeros((n, num_classes), dtype=np.float32)
MemoryError
here is the suspected line of error code:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
and also the np.utils
categorical = np.zeros((n, num_classes), dtype=np.float64)
i've trying to search the solution for similar problem, i found that i have to change categorical_crossentropy
to sparse_categorical_crossentropy
. I have do that but it's still error with the same traceback.
Thanks
Upvotes: 1
Views: 1978
Reputation: 56377
If you switch to sparse categorical cross-entropy loss, then you don't need to to_categorical
call, which is actually the one that is giving an error. Sparse categorical cross-entropy should work for this.
Upvotes: 2
Reputation: 2621
I think this error is expected. The real issue here is that you don't have enough space to allocate 1) the parameter matrix of the decision layer, and/or 2) the intermediate tensor.
The parameter matrix has the shape of input_feat_dim x output_num_classes
. As you can see, this matrix will consume a huge amount of memory when the vocabulary is large.
To train a network, we also need to keep intermediate tensors for BP, which will be even bigger -- batch_size x input_feat_dim x output_num_classes
.
So one thing you can try very quick is to reduce your batch_size
to 1/10. Of course, you can't set your batch size too small. In this case, you may want to accumulate gradients until seeing enough samples.
Upvotes: 0