javaMan
javaMan

Reputation: 6680

How to convert word2vec to glove format

I did some research and found that gensim has a script to convert glove to word2vec GLove2Wrod2Vec. I am looking to do the opposite.

Is there any simple way to convert using gensim or any other library

Upvotes: 3

Views: 3184

Answers (1)

aneesh joshi
aneesh joshi

Reputation: 583

The only difference between the glove vector file format and the word2vec file format is one line at the beginning of the .txt of the word2vec format which has

<num words> <num dimensions>

Otherwise the vectors are represented in the same manner. We do not need to change the vectors to change the format.

Quoting the page you linked in the question:

Both files are
presented in text format and almost identical except that word2vec includes
number of vectors and its dimension which is only difference regard to GloVe.
Notes
-----
GloVe format (real example can be founded `on Stanford size <https://nlp.stanford.edu/projects/glove/>`_) ::
    word1 0.123 0.134 0.532 0.152
    word2 0.934 0.412 0.532 0.159
    word3 0.334 0.241 0.324 0.188
    ...
    word9 0.334 0.241 0.324 0.188
Word2Vec format (real example can be founded `on w2v old repository <https://code.google.com/archive/p/word2vec/>`_) ::
    9 4
    word1 0.123 0.134 0.532 0.152
    word2 0.934 0.412 0.532 0.159
    word3 0.334 0.241 0.324 0.188
    ...
    word9 0.334 0.241 0.324 0.188

In the above example, word2vec's first line 9 4 tells us that we have 9 words in the vocabulary which have 4 dimensions each.

TL;DR So, to convert from w2v -> glove : remove the <num words> <num dimensions> line from w2v. You can infer it from the file anyway.

To convert from glove -> w2v : add the <num words> <num dimensions> line to glove.

You can do it manually but gensim provides a way of going from one to the other.

Upvotes: 3

Related Questions