jvc
jvc

Reputation: 614

Training tensorflow RNN with large datasets

I'm training an RNN in tensorflow. The function used is "rnn" from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/rnn.py.

outputs, states = rnn.rnn(cell, inputs, initial_state=initial_state, sequence_length=seq_length)

The reason I use this function is because my data sequences are of variable lengths. This function expects all data to be loaded at once. Since my data doesn't fit in to memory all at once, I need to load data piece by piece. Any pointers on how it can be done would be highly appreciated.

Thanks

Upvotes: 1

Views: 1334

Answers (1)

Peter Hawkins
Peter Hawkins

Reputation: 3211

The standard practice here is to break your data up into chunks and work on it a chunk at a time. For example, if you are working with text, you might break your data up into sentences, and pass mini-batches of 10s-100s of sentences to the training process one at a time.

For an example of how to do this, take a look at this RNN tutorial.

https://www.tensorflow.org/versions/r0.9/tutorials/recurrent/index.html

The tutorial text itself doesn't describe chunking in detail, but take a look at the associated code in github and see how it loads its input data and batches it for training.

https://github.com/tensorflow/tensorflow/tree/master/tensorflow/models/rnn/ptb

Hope that helps!

Upvotes: 2

Related Questions