gensim doc2vec give non-determined result

Question

I am using the Doc2Vec model in gensim python library.

Every time I feeds the model with the same sentences data and set the parameter:seed of Doc2Vec to a fixed number, the model gives different vectors after the model is built.

For tests purpose, I need a determined result every time I gave a unchanged input data. I searched a lot and does not find a way to keep the gensim's result unchanged.

Is there anything wrong in the way I use it? thanks for replying in advance.

Here is my code:

from gensim.models.doc2vec import Doc2Vec
model = Doc2Vec(sentences, dm=1, dm_concat=1, size=100, window=5, hs=0, min_count=10, seed=64)
result = model.docvecs

gojomo · Accepted Answer

The Doc2Vec algorithm makes use of randomness in both initialization and training, and efficient multi-threaded training introduces more randomness because the batches across mutliple threads won't necessarily be trained-against in the same order from run-to-run.

If the model is training well, the jitter in results from run-to-run shouldn't be large, and the quality of downstream assessments shouldn't vary much. If the quality-of-results does vary a lot, there are likely other problems with the application of the algorithm to your data or training.

Separately: you almost certainly don't want to be using the non-default dm_concat=1 mode. It results in a much larger, much slower-to-train model, and there aren't any clear public examples of it being worth that extra cost. (I'd only try it if I had a strong baseline result from more simple modes, and lots and lots of data and time.)

gensim doc2vec give non-determined result

Answers (1)

Related Questions