Reputation: 469
Classical CBOW word2vec looks like:
What is vector for specific word in this scheme? How it obtains from WI and WO matrixes? Or useful word-vectors obtains only from Skip-gram word2vec?
Upvotes: 0
Views: 216
Reputation: 54173
With regard to the diagram you've shown, each row in the WI matrix is a word-vector. (After training, when you ask the model for a word like 'cat', it will find out which slot from 0 to V stores 'cat', then return that row of the WI matrix.)
WI is initialized with random, low-magnitude vectors. WO is left as zeros at the beginning of training. During training, various rows of WO and WI are repeatedly improved, via back-propagation corrective nudges, to make the network's output layer more predictive of each (context)->(word) training example.
For skip-gram, you can think of the input-layer in this diagram as a one-hot encoding of the single context input-word. For CBOW, you can think of the input layer in this diagram as having the count of each word in the multi-word context as the xi values – most zero (sparse). In practice in CBOW, each word is looked up in WI and their word-vectors are averaged to create the hidden-layer activation.
Both skip-gram and CBOW work OK to create useful word-vectors inside WI.
Upvotes: 1