Reputation: 6154
Specifically what spurred this question is the return_sequence
argument of TensorFlow's version of an LSTM layer.
The docs say:
Boolean. Whether to return the last output. in the output sequence, or the full sequence. Default: False.
I've seen some implementations, especially autoencoders that use this argument to strip everything but the last element in the output sequence as the output of the 'encoder' half of the autoencoder.
Below are three different implementations. I'd like to understand the reasons behind the differences, as the seem like very large differences but all call themselves the same thing.
This implementation strips away all outputs of the LSTM except the last element of the sequence, and then repeats that element some number of times to reconstruct the sequence:
model = Sequential()
model.add(LSTM(100, activation='relu', input_shape=(n_in,1)))
# Decoder below
model.add(RepeatVector(n_out))
model.add(LSTM(100, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(1)))
When looking at implementations of autoencoders in PyTorch, I don't see authors doing this. Instead they use the entire output of the LSTM for the encoder (sometimes followed by a dense layer and sometimes not).
This implementation trains an embedding BEFORE an LSTM layer is applied... It seems to almost defeat the idea of an LSTM based auto-encoder... The sequence is already encoded by the time it hits the LSTM layer.
class EncoderLSTM(nn.Module):
def __init__(self, input_size, hidden_size, n_layers=1, drop_prob=0):
super(EncoderLSTM, self).__init__()
self.hidden_size = hidden_size
self.n_layers = n_layers
self.embedding = nn.Embedding(input_size, hidden_size)
self.lstm = nn.LSTM(hidden_size, hidden_size, n_layers, dropout=drop_prob, batch_first=True)
def forward(self, inputs, hidden):
# Embed input words
embedded = self.embedding(inputs)
# Pass the embedded word vectors into LSTM and return all outputs
output, hidden = self.lstm(embedded, hidden)
return output, hidden
This example encoder first expands the input with one LSTM layer, then does its compression via a second LSTM layer with a smaller number of hidden nodes. Besides the expansion, this seems in line with this paper I found: https://arxiv.org/pdf/1607.00148.pdf
However, in this implementation's decoder, there is no final dense layer. The decoding happens through a second lstm layer that expands the encoding back to the same dimension as the original input. See it here. This is not in line with the paper (although I don't know if the paper is authoritative or not).
class Encoder(nn.Module):
def __init__(self, seq_len, n_features, embedding_dim=64):
super(Encoder, self).__init__()
self.seq_len, self.n_features = seq_len, n_features
self.embedding_dim, self.hidden_dim = embedding_dim, 2 * embedding_dim
self.rnn1 = nn.LSTM(
input_size=n_features,
hidden_size=self.hidden_dim,
num_layers=1,
batch_first=True
)
self.rnn2 = nn.LSTM(
input_size=self.hidden_dim,
hidden_size=embedding_dim,
num_layers=1,
batch_first=True
)
def forward(self, x):
x = x.reshape((1, self.seq_len, self.n_features))
x, (_, _) = self.rnn1(x)
x, (hidden_n, _) = self.rnn2(x)
return hidden_n.reshape((self.n_features, self.embedding_dim))
I'm wondering about this discrepancy in implementations. The difference seems quite large. Are all of these valid ways to accomplish the same thing? Or are some of these mis-guided attempts at a "real" LSTM autoencoder?
Upvotes: 4
Views: 1121
Reputation: 2253
There is no official or correct way of designing the architecture of an LSTM based autoencoder... The only specifics the name provides is that the model should be an Autoencoder and that it should use an LSTM layer somewhere.
The implementations you found are each different and unique on their own even though they could be used for the same task.
Let's describe them:
TF implementation:
LSTM layer
in Keras/TF is to output only the last output of the LSTM, you could set it to output all the output steps with the return_sequences
parameter.(batch_size, LSTM_units)
Dense(1)
in the last layer in order to get the same shape as the input.PyTorch 1:
PyTorch 2:
(seq_len, 1)
as in the first TF example, so the decoder doesn't need a dense after. The author used a number of units in the LSTM layer equal to the input shape.In the end you choose the architecture of your model depending on the data you want to train on, specifically: the nature (text, audio, images), the input shape, the amount of data you have and so on...
Upvotes: 2