Reputation: 338
So I'm starting to study RNN, particularly LSTM, and there is part of the theory that I just don't understand.
When you stack LSTM cells, I see how everybody detaches the hidden state from history, but this makes no sense to me, aren't LSTM supposed to use hidden states from history to make better predictions?
I read the documentation but it still not clear to me, so any explanation is welcomed
Upvotes: 1
Views: 2674
Reputation: 5079
You got it right, the hidden state in the LSTMs is there to serve as a memory. But this question arises, are we supposed to learn them? No, hidden state isn’t suppose to be learned, so we detach it to let the model use those values but to not compute gradients.
If you don't detach, then the gradients will be really big.
Upvotes: 4