David Parks
David Parks

Reputation: 32111

Tensorflow RNNs with variable length sequences, padded zeros affect learning

I've set up an RNN in tensorflow that takes a variable sequence and makes 1 prediction at the end of the sequence.

I've zero padded my data to a max length of 500 sequences, but many sequences in a batch will be less than 500.

I use dynamic_rnn and pass it the sequence lengths of each sample in the batch:

# Get lstm cell output
m.outputs, m.states = tf.nn.dynamic_rnn(
    cell=lstm_cell,
    dtype=tf.float32,
    sequence_length=m.X_lengths,
    inputs=m.X)

Where m.X_lengths is the sequence lengths as a tensor which was set up as a placeholder variable. I pass it in with the feed_dict.

For the cost function, it's sigmoid cross entropy (multi-class classification), and I take the last value from m.outputs, and process that with tf.reduce_mean.

Notably, I did not do any masking of the loss function. My understanding is that masking would only be needed if I were trying to use the sum of all losses from all outputs. But I'm only using the last output.

Now I've added 1000 padded zeros onto my sequences, but the larges sequence length if still just 500, but the batch has 1500 sequence lengths. If the padding is having no effect this will learn the same as without the additional padding. When I train the model with this additional padding learning is negatively affected. Also restricting my sequence lengths to 100 improves the results.

Questions:

Upvotes: 3

Views: 2936

Answers (1)

user2827214
user2827214

Reputation: 1191

You can pass in a placeholder for sequence_lengths, and it is necessary when you use padding in your input sequences. The sequence_length parameter tells the RNN's to stop calculation once a PAD symbol is reached.

The longer your sequence is, the more padding that will have to be processed to compute the final state, degrading your signal (if you are using the last output). Instead, make sure that the 'last output' you are getting corresponds with the length of your sequence. For example if your sequence is length 7, then the 'last output' you want is outputs[6].

If you do use the sequence length parameter to dynamic_rnn(), you will see that all outputs after outputs[6] are just zero vectors.

See this similar question:

variable-length rnn padding and mask out padding gradients

Upvotes: 4

Related Questions