Reputation: 6680
I did some research and found that gensim has a script to convert glove to word2vec GLove2Wrod2Vec. I am looking to do the opposite.
Is there any simple way to convert using gensim or any other library
Upvotes: 3
Views: 3184
Reputation: 583
The only difference between the glove vector file format and the word2vec file format is one line at the beginning of the .txt
of the word2vec format which has
<num words> <num dimensions>
Otherwise the vectors are represented in the same manner. We do not need to change the vectors to change the format.
Quoting the page you linked in the question:
Both files are
presented in text format and almost identical except that word2vec includes
number of vectors and its dimension which is only difference regard to GloVe.
Notes
-----
GloVe format (real example can be founded `on Stanford size <https://nlp.stanford.edu/projects/glove/>`_) ::
word1 0.123 0.134 0.532 0.152
word2 0.934 0.412 0.532 0.159
word3 0.334 0.241 0.324 0.188
...
word9 0.334 0.241 0.324 0.188
Word2Vec format (real example can be founded `on w2v old repository <https://code.google.com/archive/p/word2vec/>`_) ::
9 4
word1 0.123 0.134 0.532 0.152
word2 0.934 0.412 0.532 0.159
word3 0.334 0.241 0.324 0.188
...
word9 0.334 0.241 0.324 0.188
In the above example, word2vec's first line 9 4
tells us that we have 9 words in the vocabulary which have 4 dimensions each.
TL;DR
So, to convert from w2v
-> glove
: remove the <num words> <num dimensions>
line from w2v
. You can infer it from the file anyway.
To convert from glove
-> w2v
: add the <num words> <num dimensions>
line to glove
.
You can do it manually but gensim provides a way of going from one to the other.
Upvotes: 3