State dimensions in Bahdanau Attention

Question

I'm currently trying to compute this function to get Bahdanau's attention

My question is with the H for the decoder and the encoder.

In one implementation, I see an h encoder with the dimensions: [max source Len, batch size, hidden size]

and a h decoder with the following dimensions: [#lstm layers, batch size, hidden dim]

How can I compute the addition if the dimensions for the W matrices have to be the same according to: https://blog.floydhub.com/attention-mechanism/#bahdanau-att-step1

Thanks for the help

Jindřich · Accepted Answer

In the original Bahdanau's paper, the decoder has only a single LSTM layer. There are various approaches how to deal with multiple layers. The quite usual thing to do is to do the attention between the layers (which you obviously did not do, see e.g., a paper by Google). If you use multiple decoder layers like this, you can use only the last layer (i.e., do h_decoder[1]), alternatively, you can concatenate the layers (i.e., in torch call torch.cat or tf.concat in the 0-th dimension).

The matrices W_decoder and W_encoder ensure that both the encoder and decoder states get projected to the same dimension (regardless if you what you did with the decoder layers), so you can do the summation.

The only remaining issue is that the encoder states have the max-length dimension. The trick here is that you need to add a dimension to the projected decoder state, so the summation gets broadcasted and the projected decoder state get summed with all the encoder states. In PyTorch, just call unsqueeze, in TensorFlow expand_dims in the 0-th dimension on the projected decoder state.

State dimensions in Bahdanau Attention

Answers (1)

Related Questions