Reputation: 1560
I'm training an LSTM network and I'm looking to understand best practices for training on long sequences, O(1k) length or more. What is a good approach to choosing a minibatch size? How would skew in label prevalence influence that choice? (Positives are rare in my scenario). Is it worthwhile to make an effort to rebalance my data? Thanks.
Upvotes: 3
Views: 944
Reputation: 12175
You probably want to rebalance so they are 50/50. Otherwise it will skew to one class or another.
As for the batch size I would go as large as will fit in memory.
I am not sure the LSTMs will be able to learn dependencies on the O(1k) but it is worth a try. You could look into doing something like wavenet if you want ultra long dependencies.
https://deepmind.com/blog/wavenet-generative-model-raw-audio/
Upvotes: 2