fodma1
fodma1

Reputation: 3535

Pytorch LSTM grad only on last output

I'm working with sequences of different lengths. But I would only want to grad them based on the output computed at the end of the sequence.

The samples are ordered so that they are decreasing in length and they are zero-padded. For 5 1D samples it looks like this (omitting width dimension for visibility):

array([[5, 7, 7, 4, 5, 8, 6, 9, 7, 9],
       [6, 4, 2, 2, 6, 5, 4, 2, 2, 0],
       [4, 6, 2, 4, 5, 1, 3, 1, 0, 0],
       [8, 8, 3, 7, 7, 7, 9, 0, 0, 0],
       [3, 2, 7, 5, 7, 0, 0, 0, 0, 0]])

For the LSTM I'm using nn.utils.rnn.pack_padded_sequence with the individual sequence lengths:

x = nn.utils.rnn.pack_padded_sequence(x, [10, 9, 8, 7, 5], batch_first=True)

The initialization of LSTM in the Model constructor:

self.lstm = nn.LSTM(width, n_hidden, 2)

Then I call the LSTM and unpack the values:

x, _ = self.lstm(x)
x = nn.utils.rnn.pad_packed_sequence(x1, batch_first=True)

Then I'm applying a fully connected layer and a softmax

x = x.contiguous()
x = x.view(-1, n_hidden)
x = self.linear(x)
x = x.reshape(batch_size, n_labels, 10) # 10 is the sample height
return F.softmax(x, dim=1)

This gives me an output of shape batch x n_labels x height (5x12x10).

For each sample, I would only want to use a single score, for the last output batch x n_labels (5*12). My question is How can I achieve this?

One idea is to apply tanh on the last hidden layer returned from the model but I'm not quite sure if that would give the same results. Is it possible to efficiently extract the output computed at the end of the sequence eg using the same lengths sequence used for pack_padded_sequence?

Upvotes: 1

Views: 2829

Answers (2)

David Ng
David Ng

Reputation: 1698

As Neaabfi answered hidden[-1] is correct. To be more specific to your question, as the docs wrote:

output, (h_n, c_n) = self.lstm(x_pack) # batch_first = True

# h_n is a vector of shape (num_layers * num_directions, batch, hidden_size)

In your case, you have a stack of 2 LSTM layers with only forward direction, then:

h_n shape is (num_layers, batch, hidden_size)

Probably, you may prefer the hidden state h_n of the last layer, then **here is what you should do:

output, (h_n, c_n) = self.lstm(x_pack)
h = h_n[-1] # h of shape (batch, hidden_size)
y = self.linear(h)

Here is the code which wraps any recurrent layer LSTM, RNN or GRU into DynamicRNN. DynamicRNN has a capacity of performing recurrent computations on sequences of varied lengths without any care about the order of lengths.

Upvotes: 1

Neabfi
Neabfi

Reputation: 4741

You can access the last hidden layer as follows:

output, (hidden, cell) = self.lstm(x_pack)
y = self.linear(hidden[-1])

Upvotes: 0

Related Questions