word2vec gives vectors of very few words in a text.Why?

When I provide a text document as input to word2vec. It assigns vectors to a very few words from the vocabulary of the text. Why does this happen? And how to overcome this problem?

Upvotes: 0

Views: 216

Answers (1)

John Wakefield
John Wakefield

Reputation: 527

I think the reason you are seeing very few vectors being created is that your corpus is too small. Word2vec will remove infrequently occurring words from the vocabulary. This is controlled by the t-min-count command line switch. The default for the original source code is set to 5. Any words that occur less than this many times in your corpus will be removed.

Upvotes: 2

Related Questions