LSTM- detach the hidden state

Question

So I'm starting to study RNN, particularly LSTM, and there is part of the theory that I just don't understand.

When you stack LSTM cells, I see how everybody detaches the hidden state from history, but this makes no sense to me, aren't LSTM supposed to use hidden states from history to make better predictions?

I read the documentation but it still not clear to me, so any explanation is welcomed

Frightera · Accepted Answer

You got it right, the hidden state in the LSTMs is there to serve as a memory. But this question arises, are we supposed to learn them? No, hidden state isn’t suppose to be learned, so we detach it to let the model use those values but to not compute gradients.

If you don't detach, then the gradients will be really big.

LSTM- detach the hidden state

Answers (1)

Related Questions