Animeta
Animeta

Reputation: 1343

What's the different between fasttext skipgram and word2vec skipgram?

Given a sentence 'hello world', the vocabulary is

{hello, world} + {<hel, hell, ello, llo>, <wor, worl, orld, rld>},

for convenience, just list all 4-gram.

In my comprehension, the word2vec skipgram will maximize

What will fasttext skipgram do?

Upvotes: 0

Views: 1577

Answers (1)

Jindřich
Jindřich

Reputation: 11220

tl;dr

The optimization criterion is the same, the difference is how the model gets the word vector.

Using formulas

Fasttext optimizes the same criterion as the standard skipgram model (using the formula from the FastText paper):

enter image description here

with all the approximation tricks that make the optimization computationally efficient. In the end, they get this:

enter image description here

There is a sum over all words wc and approximate the denominator using some negative samples n. The crucial difference is in the function s. In the original skip-gram model, it is a dot product of the two word embeddings.

However, in the FastText case, the function s is redefined:

enter image description here

Word wt is represented as a sum of all n-grams zg the word consist of plus a vector for the word itself. You basically want to make no only the word, but also all its substrings probable in the given context window.

Upvotes: 3

Related Questions