Reputation: 371
I use Gensim Doc2vec model to train document vectors. I printed out representations for the word 'good', but I found every epoch, I found not updating! While I printed out representations for the document with id '3', every epoch different!
My codes are below, do not know what is happening.
model = gensim.models.Doc2Vec(dm = 0, alpha=0.1, size= 20, min_alpha=0.025)
model.build_vocab(documents)
print ('Building model....',(time4-time3))
for epoch in range(10):
model.train(documents)
print('Now training epoch %s' % epoch)
print(model['good'])
print(model.docvecs[str(3)])
Upvotes: 1
Views: 997
Reputation: 54153
The pure PV-DBOW model (dm=0
) doesn't involve use or training of word-vectors at all. (It's just an artifact of the shared-code with Word2Vec
that they're allocated and randomly-initialized at all.)
If you want word-vectors to be trained in an interleaved fashion, you must use the non-default dbow_words=1
parameter. (Or, switch to PV-DM mode, dm=1
, where word-vectors are inherently involved.)
Upvotes: 3
Reputation: 23
in every epoch gensim use random value for word vector at first and then start to train model. in doc2vec(or word2vec) shouldn't every final word vector for a word(exp. 'good') are same but the similar word are similar word vectors. for example in one epoch:
model['good'] = [0.22 0.52 0.36]
model['better'] = [0.24 0.50 0.39]
and in another epoch:
model['good'] = [0.58 0.96 0.24]
model['better'] = [0.59 0.90 0.21]
Upvotes: 0
Reputation: 4898
This is not the correct way to check representations after every update.
Gensim doc2vec
uses an iter
parameter to define what the number of epochs should be (see docs), whose default
value is 5.
Essentially what is happening in the following loop:
for epoch in range(10):
model.train(documents)
you are training your model 10 times, from scratch to 5 epochs.
I don't think Gensim at present allows, to check representations after every epoch. One crude way of doing it would be:
model.train(documents, iter=1)
print('Now training epoch %s' % epoch)
print(model['good'])
print(model.docvecs[str(3)])
model.train(documents, iter=2)
print('Now training epoch %s' % epoch)
print(model['good'])
print(model.docvecs[str(3)])
Upvotes: 1