Reputation: 880
I was reading the implementation of LSTM in Pytorch. The code goes like this:
lstm = nn.LSTM(3, 3) # Input dim is 3, output dim is 3
inputs = [torch.randn(1, 3) for _ in range(5)] # make a sequence of length 5
# initialize the hidden state.
hidden = (torch.randn(1, 1, 3),
torch.randn(1, 1, 3))
for i in inputs:
# Step through the sequence one element at a time.
# after each step, hidden contains the hidden state.
out, hidden = lstm(i.view(1, 1, -1), hidden)
I don't understand why the hidden state is defined by a tuple of two tensors instead of one? Since the hidden layer is simply a layer of the feed-forward neural network which is a vector.
Upvotes: 0
Views: 5786
Reputation: 16450
Apart from the hidden state, LSTM also has cell state, C. Therefore, a tuple is passed I think. See https://pytorch.org/docs/stable/nn.html#lstmcell.
If you don't pass C, it is taken to be all zeros.
Note that this is the case for LSTM, GRU or RNN do not have C.
Upvotes: 2