Reputation: 2731
I am reading the Transformer paper, and the Positional Embeddings make me wonder a thing:
Assume that the word "cat" is pretrained to be embedded to the word vector [2,3,1,4]
. If we use the positional encoding that turns the vector into a new one, like [3,1,5,2]
, should not it change also the word's meaning in the word2vec matrix? Since the corpus is large, a slight change in the value can also change its meaning.
Upvotes: 0
Views: 764
Reputation: 2704
word2vec and Transformer treat tokens completely different.
word2vec is context-free which means bank
is always some fixed vector from the word2vec matrix, in other words, the vector of bank
doesn't depend on the token's position in the sentence.
On the other hand, Transformer as a input receives the tokes' embeddings and positional embeddings, to add a sense of position to the tokens. Otherwise, it relates to the text as a bag-of-words and not as a sequence.
Upvotes: 1