Pytorch LSTM grad only on last output

Question

I'm working with sequences of different lengths. But I would only want to grad them based on the output computed at the end of the sequence.

The samples are ordered so that they are decreasing in length and they are zero-padded. For 5 1D samples it looks like this (omitting width dimension for visibility):

array([[5, 7, 7, 4, 5, 8, 6, 9, 7, 9],
       [6, 4, 2, 2, 6, 5, 4, 2, 2, 0],
       [4, 6, 2, 4, 5, 1, 3, 1, 0, 0],
       [8, 8, 3, 7, 7, 7, 9, 0, 0, 0],
       [3, 2, 7, 5, 7, 0, 0, 0, 0, 0]])

For the LSTM I'm using nn.utils.rnn.pack_padded_sequence with the individual sequence lengths:

x = nn.utils.rnn.pack_padded_sequence(x, [10, 9, 8, 7, 5], batch_first=True)

The initialization of LSTM in the Model constructor:

self.lstm = nn.LSTM(width, n_hidden, 2)

Then I call the LSTM and unpack the values:

x, _ = self.lstm(x)
x = nn.utils.rnn.pad_packed_sequence(x1, batch_first=True)

Then I'm applying a fully connected layer and a softmax

x = x.contiguous()
x = x.view(-1, n_hidden)
x = self.linear(x)
x = x.reshape(batch_size, n_labels, 10) # 10 is the sample height
return F.softmax(x, dim=1)

This gives me an output of shape batch x n_labels x height (5x12x10).

For each sample, I would only want to use a single score, for the last output batch x n_labels (5*12). My question is How can I achieve this?

One idea is to apply tanh on the last hidden layer returned from the model but I'm not quite sure if that would give the same results. Is it possible to efficiently extract the output computed at the end of the sequence eg using the same lengths sequence used for pack_padded_sequence?

David Ng · Accepted Answer

As Neaabfi answered hidden[-1] is correct. To be more specific to your question, as the docs wrote:

output, (h_n, c_n) = self.lstm(x_pack) # batch_first = True

# h_n is a vector of shape (num_layers * num_directions, batch, hidden_size)

In your case, you have a stack of 2 LSTM layers with only forward direction, then:

h_n shape is (num_layers, batch, hidden_size)

Probably, you may prefer the hidden state h_n of the last layer, then **here is what you should do:

output, (h_n, c_n) = self.lstm(x_pack)
h = h_n[-1] # h of shape (batch, hidden_size)
y = self.linear(h)

Here is the code which wraps any recurrent layer LSTM, RNN or GRU into DynamicRNN. DynamicRNN has a capacity of performing recurrent computations on sequences of varied lengths without any care about the order of lengths.

Pytorch LSTM grad only on last output

Answers (2)

Related Questions