Interpreting attention in Keras Transformer official example

Question

I have implemented a model as explained in (Text classification with Transformer) https://keras.io/examples/nlp/text_classification_with_transformer/

I would like to access the attention values for a specific example.

I understand attention is calculated somewhere around this point:

class TransformerBlock(layers.Layer):
    [...]

def call(self, inputs, training):
    attn_output = self.att(inputs)
    attn_output = self.dropout1(attn_output, training=training)
    out1 = self.layernorm1(inputs + attn_output)
    ffn_output = self.ffn(out1)
    ffn_output = self.dropout2(ffn_output, training=training)
    return self.layernorm2(out1 + ffn_output)

[...]

embed_dim = 32  # Embedding size for each token

num_heads = 2  # Number of attention heads
ff_dim = 32  # Hidden layer size in feed forward network inside transformer

inputs = layers.Input(shape=(maxlen,))
embedding_layer = TokenAndPositionEmbedding(maxlen, vocab_size, embed_dim)
x = embedding_layer(inputs)
transformer_block = TransformerBlock(embed_dim, num_heads, ff_dim)
x = transformer_block(x)
x = layers.GlobalAveragePooling1D()(x)
x = layers.Dropout(0.1)(x)
x = layers.Dense(20, activation="relu")(x)
x = layers.Dropout(0.1)(x)
outputs = layers.Dense(2, activation="softmax")(x)

If I do:

A=(model.layers[2].att(model.layers[1](model.layers[0]((X_train[0,:])))))

I can retrieve a matrix sized as maxlen xnum_heads .

How should I interpret these coefficients?

Chompakorn CChaichot · Accepted Answer

EDIT: In case you want to interpret the classification output using attention

From what I know, it is impossible to fully interpret what Transformer does in classification. What Transformer does is just to see how each input is related to each other, not how each word contributes to the label. If you wish to find the model that is interpretable, try looking at LSTM attention for classification.

Ok, so I've read your code and spotted some mistakes when you calling model.layers[1]. First, you need to understand that the model is processing the data in a batch. Therefore, your input should be in a format of (batch_size, seq_len). However, your input shape seems to drop the first dimension (which is batch) which makes your model think that you are giving a model 200 sentences with a sequence length of 1. Therefore the output shape is looking strange as seen in the image.

The correct method is to add an extra dimension to the first dimension (using tf.expand_dims).

Now, for interpreting the results. You need to know that the Transformer block does self-attention (which finds the scores for each word to other words in the sentences) and weighted sum it. Thus, the output would be the same as the embedding layer and you wouldn't be able to explain it (as it is a hidden vector generated by the network).

However, you can see the attention scores for each head by using the following codes:

import seaborn as sns
import matplotlib.pyplot as plt

head_num=1
inp = tf.expand_dims(x_train[0,:], axis=0)
emb = model.layers[1](model.layers[0]((inp)))

self_attn = model.layers[2].att
# compute Q, K, V
query = self_attn.query_dense(emb)
key = self_attn.key_dense(emb)
value = self_attn.value_dense(emb)
# separate heads
query = self_attn.separate_heads(query, 1) # batch_size = 1
key = self_attn.separate_heads(key, 1) # batch_size = 1
value = self_attn.separate_heads(value, 1) # batch_size = 1
# compute attention scores (QK^T)
attention, weights = self_attn.attention(query, key, value)

idx_word = {v: k for k, v in keras.datasets.imdb.get_word_index().items()}
plt.figure(figsize=(30, 30))
sns.heatmap(
    weights.numpy()[0][head_num], 
    xticklabels=[idx_word[idx] for idx in inp[0].numpy()],
    yticklabels=[idx_word[idx] for idx in inp[0].numpy()]
)

Here's an example output:

Interpreting attention in Keras Transformer official example

Answers (1)

Related Questions