What impact does vocabulary_size have on word2vec tensorflow implementation?

Question

I've performed the steps this guide to generate a vector representation of words.

Now I'm using a custom dataset of 45'000 words I'm running word2vec on.

To run I modified word2vec_basic.py to use my own dataset by modifying https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/word2vec/word2vec_basic.py#L57 to words = read_data('mytextfile.zip')

I encountered an issue similar to https://github.com/tensorflow/tensorflow/issues/2777 and so reduced the vocabulary_size to 200 . It now runs but the results do not appear to be capturing the context. For example here is a sample output :

Nearest to Leave: Employee, it, •, due, You, appeal, Employees, which,

What can I infer from this output ? Will increasing/decreasing vocabulary_size improve results ?

I'm using python3 so to run I use python3 word2vec_basic2.py

What impact does vocabulary_size have on word2vec tensorflow implementation?

Answers (1)

Related Questions