Reputation: 377
I have a pre-trained model, but I need to add some new words in it.
I tried:
model.build_vocab([[new_word1, new_word2]], update=True)
model.train([[new_word1, new_word2]], total_examples=model.corpus_count, epochs=model.epochs)
But when I check:
model.wv[new_word1]
model.wv[new_word2]
I got
KeyError: "Key {new_word1} not present"
same as new_word2
I have checked this How to add words and vectors manually to Word2vec gensim?
How can I solve it? Thanks
Upvotes: 0
Views: 368
Reputation: 54153
If you enable logging at the INFO level, you may see more hints of where things may not be having the expeted effect.
In particular, the default min_count
value used by Word2Vec
is 5
, meaning any words that appear fewer than 5 times in a corpus fed to .build_vocab()
will be ignored. (Ignoring such rare words is almost always the right thing to do with the word2vec algorithm, which can only learn useful word-vectors when there are many varied examples of a word's usage.)
If you test is truly just 2 new words, each with just one use, a model with reasonable defaults will ignore those two single-occurrence words.
Separately: expanding the vocabulary of an existing model is a tricky, error-prone process. Most improvised/naive ways of doing it are unlikely to reliably give good results. In most cases the safer, more robust process would be re-training with all text, old and new, rather than tiny new increments.
Upvotes: 2