What is vector for specific word in CBOW word2vec?

Question

Classical CBOW word2vec looks like:

What is vector for specific word in this scheme? How it obtains from WI and WO matrixes? Or useful word-vectors obtains only from Skip-gram word2vec?

gojomo · Accepted Answer

With regard to the diagram you've shown, each row in the WI matrix is a word-vector. (After training, when you ask the model for a word like 'cat', it will find out which slot from 0 to V stores 'cat', then return that row of the WI matrix.)

WI is initialized with random, low-magnitude vectors. WO is left as zeros at the beginning of training. During training, various rows of WO and WI are repeatedly improved, via back-propagation corrective nudges, to make the network's output layer more predictive of each (context)->(word) training example.

For skip-gram, you can think of the input-layer in this diagram as a one-hot encoding of the single context input-word. For CBOW, you can think of the input layer in this diagram as having the count of each word in the multi-word context as the x_i values – most zero (sparse). In practice in CBOW, each word is looked up in WI and their word-vectors are averaged to create the hidden-layer activation.

Both skip-gram and CBOW work OK to create useful word-vectors inside WI.

What is vector for specific word in CBOW word2vec?

Answers (1)

Related Questions