Reputation: 1343
Given a sentence 'hello world', the vocabulary is
{hello, world} + {<hel, hell, ello, llo>, <wor, worl, orld, rld>},
for convenience, just list all 4-gram.
In my comprehension, the word2vec skipgram will maximize
What will fasttext skipgram do?
Upvotes: 0
Views: 1577
Reputation: 11220
The optimization criterion is the same, the difference is how the model gets the word vector.
Fasttext optimizes the same criterion as the standard skipgram model (using the formula from the FastText paper):
with all the approximation tricks that make the optimization computationally efficient. In the end, they get this:
There is a sum over all words wc and approximate the denominator using some negative samples n. The crucial difference is in the function s. In the original skip-gram model, it is a dot product of the two word embeddings.
However, in the FastText case, the function s is redefined:
Word wt is represented as a sum of all n-grams zg the word consist of plus a vector for the word itself. You basically want to make no only the word, but also all its substrings probable in the given context window.
Upvotes: 3