Reputation: 886
Why do we need to initialize the hidden state h0 in LSTM in pytorch. As h0 will anyways be calculated and get overwritten ? Isn't it like
int a a = 0
a = 4
Even if we do not do a=0, it should be fine..
Upvotes: 3
Views: 4256
Reputation: 57737
The point is that you are able to supply the initial state, it is a feature. They could have implemented it as a default but by letting you control the allocation of the tensor you can save some memory (allocating once, zeroing on every invocation).
Why would you need to set h
? Sequence-to-sequence models require this (compress input to one vector, use this vector as hidden state for the decoder) or you might want to make the initial state learnable.
Upvotes: 4