Reputation: 11
I have an error when I apply the Elmo embedding to my data. I have 7255 sentences.
embeddings = embed(
sentences,
signature="default",
as_dict=True)['default']
#Start a session and run ELMo to return the embeddings in variable x
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.tables_initializer())
x = sess.run(embeddings)
The error is:
ResourceExhaustedError: OOM when allocating tensor with shape[36021075,50] and type int32 on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [[node module_apply_default/map/TensorArrayStack/TensorArrayGatherV3 (defined at C:\Users...\envs\tf_36\lib\site-packages\tensorflow_hub\native_module.py:547) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Upvotes: 1
Views: 205
Reputation: 11220
ELMo is a large model. There are 2048-dimensional word embeddings, 4096-dimensional LSTM states in 2 layers and 2 directions. Only this is 18k floats, 71 kB per word (and there is much more: intermediate projections in LSTMs, character level CNN for word representation). You have 7,255 sentences, the average sentence has 25 words, this gives 12 GB RAM, but it is a very conservative estimate.
You need to split the sentences into batches and process the batches iteratively. There are many ways how to do that and I don't what implementation you use and what exactly is in variable sentences
. But you probably can call tf.split
on sentences
and get a list of objects for which call session independently or if you use tf.dataset
, you can use batching provided by the dataset API. You can always also split your data and use multiple input files.
Upvotes: 1