Reputation: 71
I have a class Featurizer that checks for the existance of an embedding_file which has word embeddings from Google news vectors and loads it when called. However when i use the class Feauturizer to load the model.
It gives an error
AttributeError Traceback (most recent call last)
d:\mt 111\QuestionAnswer\training_model.ipynb Cell 11' in <cell line: 2>()
1 emb_file = os.path.join('D:\mt 111\QuestionAnswer\embedding_file', 'GoogleNews-vectors-negative300.bin')
----> 2 featurizer = Featurizer(emb_file)
d:\mt 111\QuestionAnswer\training_model.ipynb Cell 4' in Featurizer.__init__(self, embedding_file)
11 print('INFO: Loading word vectors...')
12 self.word2vec = KeyedVectors.load_word2vec_format(
13 'GoogleNews-vectors-negative300.bin',
14 binary=True)
16 print('INFO: Done! Using %s word vectors from pre-trained word2vec.' \
---> 17 %len(self.word2vec.vocab))
File d:\mt 111\QuestionAnswer\venv\lib\site-packages\gensim\models\keyedvectors.py:735, in KeyedVectors.vocab(self)
733 @property
734 def vocab(self):
--> 735 raise AttributeError(
736 "The vocab attribute was removed from KeyedVector in Gensim 4.0.0.\n"
737 "Use KeyedVector's .key_to_index dict, .index_to_key list, and methods "
738 ".get_vecattr(key, attr) and .set_vecattr(key, attr, new_val) instead.\n"
739 "See https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4"
740 )
AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0.
Use KeyedVector's .key_to_index dict, .index_to_key list, and methods .get_vecattr(key, attr) and .set_vecattr(key, attr, new_val)instead.
See https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4
This is the class,
class Featurizer:
def __init__(self, embedding_file):
if not os.path.exists(embedding_file):
raise IOError("Embeddings file does not exist: %s" %embedding_file)
punctuation = string.punctuation
punctuation = punctuation + "’" + "“" + "?" + "‘"
self.punctuation = punctuation
print('INFO: Loading word vectors...')
self.word2vec = KeyedVectors.load_word2vec_format(
embedding_file,
binary=True)
print('INFO: Done! Using %s word vectors from pre-trained word2vec.' \
%len(self.word2vec.vocab))
When i try to load the embeddings using the class Featurizer
emb_file = os.path.join('D:\mt 111\QuestionAnswer\embedding_file', 'GoogleNews-vectors-negative300.bin')
featurizer = Featurizer(emb_file)
Ideally, if it loaded properly. It would give a message output from the Featurizer class such as
emb_file = os.path.join('D:\mt 111\QuestionAnswer\embedding_file', 'GoogleNews-vectors-negative300.bin')
featurizer = Featurizer(emb_file)
INFO: Loading word vectors...
INFO: Done! Using 3000000 word vectors from pre-trained word2vec.
How can i go about this!!!
Upvotes: 1
Views: 1371
Reputation: 54183
The load succeeded; the failure was in your line of code that tried to report len(self.word2vec.vocab)
.
Let me quote the error message for the reason that your code couldn't access a .vocab
property:
The vocab attribute was removed from KeyedVector in Gensim 4.0.0. Use KeyedVector's .key_to_index dict, .index_to_key list, and methods .get_vecattr(key, attr) and .set_vecattr(key, attr, new_val) instead. See https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4
So, you can't use .vocab
anymore, but there are several new properties listed there, like .key_to_index
(a dict like vocab
was) or .index_to_key
(a list of all lookup keys – words – in the set-of-vectors).
Have you tried using any of those specific properties recommended in the error message you received, instead of .vocab
?
Or, visiting the recommended URL, which makes specific suggestions with before and after code examples how to replace references to the no-longer-available .vocab
attribute? Here are the relevant lines of things not to do (🚫), and to do instead (👍), for your case:
vocab_len = len(model.wv.vocab) # 🚫
…
vocab_len = len(model.wv) # 👍
Upvotes: 2