Reputation: 1722
Using the Gensim package, I have trained a word2vec model on the corpus that I am working with as follows:
word2vec = Word2Vec(all_words, min_count = 3, size = 512, sg = 1)
Using Numpy, I have initialized a random array with the same dimensions:
vector = (rand(512)-0.5) *20
Now, I would like to find the words from the word2vec that are most similar to the random vector that I initialized.
For words in the word2vec, you can run:
word2vec.most_similar('word')
And the output is a list with most similar words and their according distance.
I would like to get a similar output for my initialized array.
However, when I run:
word2vec.most_similar(vector)
I get the following error:
Traceback (most recent call last):
File "<ipython-input-297-3815cf183d05>", line 1, in <module>
word2vec.most_similar(vector)
File "C:\Users\20200016\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\utils.py", line 1461, in new_func1
return func(*args, **kwargs)
File "C:\Users\20200016\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\models\base_any2vec.py", line 1383, in most_similar
return self.wv.most_similar(positive, negative, topn, restrict_vocab, indexer)
File "C:\Users\20200016\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\models\keyedvectors.py", line 549, in most_similar
for word, weight in positive + negative:
TypeError: cannot unpack non-iterable numpy.float64 object
What can I do to overcome this error and find the most similar words to my arrays?
I've checked this and this page. However, it is unclear to me how I could solve my problem with these suggestions.
Upvotes: 1
Views: 230
Reputation: 54243
Gensim's KeyedVectors
interface .most_similar()
method can take raw vectors as its target, but in order for its current (at least through gensim-3.8.3
) argument-type-detection to not mistake a single vector for a list-of-keys, you would need to provide it explicitly as one member of a list of items for the named positive
parameter.
Specifically, this should work:
similars = word2vec.wv.most_similar(positive=[vector,])
Upvotes: 1
Reputation: 22033
You are trying to see if a floating point number is similar to a string, and that doesn't work (cannot unpack non-iterable numpy.float64 object
).
What you need to do is to properly generate random strings, not random floating point numbers. Once this is done, your code will work. See also the documentation that states list of str
(https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.WordEmbeddingsKeyedVectors.most_similar)
Upvotes: 1