Reputation: 323
Im trying to understand the newly implemented keras
transformer class: https://keras.io/examples/nlp/text_classification_with_transformer/
I see text is first embedded and then self-attention is used. But what if I want to use another embedding than the TokenAndPositionEmbedding
- e.g. in my case I have pre-embedded sentences and like to use self-attention on them.
What I dont understand is what the self.pos_emb
does. The class TokenAndPositionEmbedding
is returning x
and positions
, with x
being the token_embedding
and positions
being the number of words to consider? So its basically returning two things? I dont understant that..
class TokenAndPositionEmbedding(layers.Layer):
def __init__(self, maxlen, vocab_size, emded_dim):
super(TokenAndPositionEmbedding, self).__init__()
self.token_emb = layers.Embedding(input_dim=vocab_size, output_dim=emded_dim)
self.pos_emb = layers.Embedding(input_dim=maxlen, output_dim=emded_dim)
def call(self, x):
maxlen = tf.shape(x)[-1]
positions = tf.range(start=0, limit=maxlen, delta=1)
positions = self.pos_emb(positions)
x = self.token_emb(x)
return x + positions
Or do I just feed my embedded sentences to MultiHeadSelfAttention
and put a Dense-Layer after it for classification purpose?
Upvotes: 1
Views: 2500
Reputation: 188
As you know the transformer is the structure based on nothing but just lots of Dense
layers with concepts of residual; however, this make the time series data losing its time dependence. So for transformer, you need to locate the position, which you can consider as the additional information for this structure so that it won't miss the time dependence. If you would like to understand it better by using keras, I will suggest the official tutorial written by Tensorflow: https://www.tensorflow.org/tutorials/text/transformer
which details the things you would like to know.
Upvotes: 1