Any way to optimize large input's memory usage in keras?

Question

I am trying to use 2D CNN to do text classification on Chinese articles and have some trouble of keras Convolution2D. I know the basic flow of Convolution2D to cope with image, but stuck by using my dataset with keras. This is one of my problems:

Dataset

9800 Chinese Article.

negative article and non-negtive article[please note it may be positive or neutrality] , just a binary classification problem. I have a test on Convolution1D NN, but the result is not good.

Use tokenizer and word2vec to transform to a shape (9800, 6810, 200).

longest article has 6810 word, shortest article has less 50 word， need padding all article to 6810， 200 is word2vec size(Seems some people call it embedding_size ? ). Format like:

 1     [[word2vec size=200], [word2vec size=200], [word2vec size=200], [word2vec size=200], [word2vec size=200], [word2vec size=200]]
 2     [[word2vec size=200], [word2vec size=200], [word2vec size=200], [word2vec size=200], [word2vec size=200], [word2vec size=200]]
 ....
 9999  [[word2vec size=200], [word2vec size=200], [word2vec size=200], [word2vec size=200], [word2vec size=200], [word2vec size=200]]

Is the article max. word length 6810 too large? I have to reduce the 9800 samples to 6500 to avoid a MemoryError, because 6500 already eats all my 32GB RAM. Any way to optimize the memory usage except trimming all the articles to a shorter length?

Any way to optimize large input's memory usage in keras?

Dataset

Answers (1)

Related Questions

Any way to optimize large input&#39;s memory usage in keras?

Dataset

Answers (1)

Related Questions

Any way to optimize large input's memory usage in keras?