Why do we need pack_padded_sequence() when we have pack_sequence()?

Question

After reading the answers to this question I'm still a bit confused about the whole PackedSequence object thing. As I understand it, this is an object optimized for parallel processing of variable sized sequences in recurrent models, a problem to which zero padding is one [imperfect] solution. It seems that given a PackedSequence object, a Pytorch RNN will process each sequence in the batch to its end, and not continue to process the padding. So why is padding needed here? Why are there both a pack_padded_sequence() and pack_sequence() methods?

Szymon Maszke · Accepted Answer

Mostly for historical reasons; torch.nn.pack_padded_sequence() was created before torch.nn.pack_sequence() (the later appeared in 0.4.0 for the first time if I see correctly) and I suppose there was no reason to remove this functionality and break backward compatibility.

Furthermore, it's not always clear what's the best/fastest way to pad your input and it highly varies on data you are using. When data was somehow padded beforehand (e.g. your data was pre-padded and provided to you like that) it is faster to use pack_padded_sequence() (see source code of pack_sequence, it's calculating length of each data point for you and calls pad_sequence followed by pack_padded_sequence internally). Arguably pad_packed_sequence is rarely of use right now though.

Lastly, please notice enforce_sorted argument provided since 1.2.0 version for both of those functions. Not so long ago users had to sort their data (or batch) with the longest sequence first and shortest last, now it can be done internally when this parameter is set to False.

Why do we need pack_padded_sequence() when we have pack_sequence()?

Answers (1)

Related Questions