Reputation: 122052
From the PyTorch Seq2Seq tutorial, http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html#attention-decoder
We see that the attention mechanism is heavily reliant on the MAX_LENGTH
parameter to determine the output dimensions of the attn -> attn_softmax -> attn_weights
, i.e.
class AttnDecoderRNN(nn.Module):
def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):
super(AttnDecoderRNN, self).__init__()
self.hidden_size = hidden_size
self.output_size = output_size
self.dropout_p = dropout_p
self.max_length = max_length
self.embedding = nn.Embedding(self.output_size, self.hidden_size)
self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)
self.dropout = nn.Dropout(self.dropout_p)
self.gru = nn.GRU(self.hidden_size, self.hidden_size)
self.out = nn.Linear(self.hidden_size, self.output_size)
More specifically
self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
I understand that the MAX_LENGTH
variable is the mechanism to reduce the no. of parameters that needs to be trained in the AttentionDecoderRNN
.
If we don't have a MAX_LENGTH
pre-determined. What values should we initialize the attn
layer with?
Would it be the output_size
? If so, then that'll be learning the attention with respect to the full vocabulary in the target language. Isn't that the real intention of the Bahdanau (2015) attention paper?
Upvotes: 4
Views: 486
Reputation: 18693
Attention modulates the input to the decoder. That is attention modulates the encoded sequence which is of the same length as the input sequence. Thus, MAX_LENGTH
should be the maximum sequence length of all your input sequences.
Upvotes: 5