sww
sww

Reputation: 886

Pytorch hidden state LSTM

Why do we need to initialize the hidden state h0 in LSTM in pytorch. As h0 will anyways be calculated and get overwritten ? Isn't it like

int a a = 0

a = 4

Even if we do not do a=0, it should be fine..

Upvotes: 3

Views: 4256

Answers (1)

nemo
nemo

Reputation: 57737

The point is that you are able to supply the initial state, it is a feature. They could have implemented it as a default but by letting you control the allocation of the tensor you can save some memory (allocating once, zeroing on every invocation).

Why would you need to set h? Sequence-to-sequence models require this (compress input to one vector, use this vector as hidden state for the decoder) or you might want to make the initial state learnable.

Upvotes: 4

Related Questions