Training RNNs on long sequences

Question

I'm training an LSTM network and I'm looking to understand best practices for training on long sequences, O(1k) length or more. What is a good approach to choosing a minibatch size? How would skew in label prevalence influence that choice? (Positives are rare in my scenario). Is it worthwhile to make an effort to rebalance my data? Thanks.

chasep255 · Accepted Answer

You probably want to rebalance so they are 50/50. Otherwise it will skew to one class or another.

As for the batch size I would go as large as will fit in memory.

I am not sure the LSTMs will be able to learn dependencies on the O(1k) but it is worth a try. You could look into doing something like wavenet if you want ultra long dependencies.

https://deepmind.com/blog/wavenet-generative-model-raw-audio/

Training RNNs on long sequences

Answers (1)

Related Questions