Reputation: 7312
What is the difference between LSTM and LSTMCell in Pytorch (currently version 1.1)? It seems that LSTMCell is a special case of LSTM (i.e. with only one layer, unidirectional, no dropout).
Then, what's the purpose of having both implementations? Unless I'm missing something, it's trivial to use an LSTM object as an LSTMCell (or alternatively, it's pretty easy to use multiple LSTMCells to create the LSTM object)
Upvotes: 34
Views: 22403
Reputation: 122
Let me show some specific examples:
# LSTM example:
>>> rnn = nn.LSTM(10, 20, 2)
>>> input = torch.randn(5, 3, 10)
>>> h0 = torch.randn(2, 3, 20)
>>> c0 = torch.randn(2, 3, 20)
>>> output, (hn, cn) = rnn(input, (h0, c0))
# LSTMCell example:
>>> rnn = nn.LSTMCell(10, 20)
>>> input = torch.randn(3, 10)
>>> hx = torch.randn(3, 20)
>>> cx = torch.randn(3, 20)
>>> output = []
>>> for i in range(6):
hx, cx = rnn(input[i], (hx, cx))
output.append(hx)
The key difference:
2
, stands num_layers
, number of recurrent layers. There are seq_len * num_layers=5 * 2
cells. No loop but more cells.for
loop (seq_len=5
times), each output of ith
instance will be input of (i+1)th
instance. There is only one cell, Truly RecurrentIf we set num_layers=1
in LSTM or add one more LSTMCell, the codes above will be the same.
Obviously, It is easier to apply parallel computing in LSTM.
Upvotes: 1
Reputation: 11213
Yes, you can emulate one by another, the reason for having them separate is efficiency.
LSTMCell
is a cell that takes arguments:
It is a straightforward implementation of the equations.
LSTM
is a layer applying an LSTM cell (or multiple LSTM cells) in a "for loop", but the loop is heavily optimized using cuDNN. Its input is
You often might want to use the LSTM cell in a different context than apply it over a sequence, i.e. make an LSTM that operates over a tree-like structure. When you write a decoder in sequence-to-sequence models, you also call the cell in a loop and stop the loop when the end-of-sequence symbol is decoded.
Upvotes: 51