Reputation: 25387
I am having a ready to go word2vec model that I already trained. I have serialized it as a CSV file:
word, v0, v1, ..., vN
house, 0.1234, 0.4567, ..., 0.3461
car, 0.456, 0.677, ..., 0.3461
What I'd like to know is how I can load that word vector model in gensim
and use that to train a paragraph or doc2vec model.
This Doc2Vec tutorial says I can load a model in form of a "# C text format
" but I have no idea what that actually means. What is "C text format" in the first place but more important:
How do I build the vocabulary from my word2vec model?
Upvotes: 1
Views: 2762
Reputation: 54173
Doc2Vec does not need word-vectors as an input: it will create any word-vectors that are needed during its own training. (And some modes, like pure DBOW – dm=0, dbow_words=0
– don't use or train word-vectors at all.)
Seeding a Doc2Vec model with word-vectors might help or hurt; there's not much theory or published results to offer guidance. There's an experimental method on Word2Vec, intersect_word2vec_format()
, that can merge word2vec-c-format vectors into a model with an existing vocabulary, but you'd need to review the source to really understand its assumptions:
Upvotes: 1