Irene Li
Irene Li

Reputation: 371

Doc2vec Gensim: the word embeddings not updating during each epoch

I use Gensim Doc2vec model to train document vectors. I printed out representations for the word 'good', but I found every epoch, I found not updating! While I printed out representations for the document with id '3', every epoch different!

My codes are below, do not know what is happening.

model = gensim.models.Doc2Vec(dm = 0, alpha=0.1, size= 20, min_alpha=0.025)

model.build_vocab(documents)

print ('Building model....',(time4-time3))
for epoch in range(10):
    model.train(documents)

    print('Now training epoch %s' % epoch)
    print(model['good'])
    print(model.docvecs[str(3)])

Upvotes: 1

Views: 997

Answers (3)

gojomo
gojomo

Reputation: 54153

The pure PV-DBOW model (dm=0) doesn't involve use or training of word-vectors at all. (It's just an artifact of the shared-code with Word2Vec that they're allocated and randomly-initialized at all.)

If you want word-vectors to be trained in an interleaved fashion, you must use the non-default dbow_words=1 parameter. (Or, switch to PV-DM mode, dm=1, where word-vectors are inherently involved.)

Upvotes: 3

Majid
Majid

Reputation: 23

in every epoch gensim use random value for word vector at first and then start to train model. in doc2vec(or word2vec) shouldn't every final word vector for a word(exp. 'good') are same but the similar word are similar word vectors. for example in one epoch:

model['good'] = [0.22 0.52 0.36]
model['better'] = [0.24 0.50 0.39]

and in another epoch:

model['good'] = [0.58 0.96 0.24]
model['better'] = [0.59 0.90 0.21]

Upvotes: 0

kampta
kampta

Reputation: 4898

This is not the correct way to check representations after every update. Gensim doc2vec uses an iter parameter to define what the number of epochs should be (see docs), whose default value is 5.

Essentially what is happening in the following loop:

for epoch in range(10):
    model.train(documents)

you are training your model 10 times, from scratch to 5 epochs.

I don't think Gensim at present allows, to check representations after every epoch. One crude way of doing it would be:

model.train(documents, iter=1)
print('Now training epoch %s' % epoch)
print(model['good'])
print(model.docvecs[str(3)])

model.train(documents, iter=2)
print('Now training epoch %s' % epoch)
print(model['good'])
print(model.docvecs[str(3)])

Upvotes: 1

Related Questions