pietro
pietro

Reputation: 1

get closest vector from unknown vector with gensim

I am currently implementing a natural text generator for a school project. I have a dataset of sentences of predetermined lenght and key words, I convert them in vectors thanks to gensim and GoogleNews-vectors-negative300.bin.gz. I train a recurrent neural network to create a list of vectors that I compare to the list of vectors of the real sentence. So I try to get as close as possible to the "real" vectors.

My problem happens when I have to convert back vectors into words: my vectors aren't necessarily in the google set. So I would like to know if there is an efficient solution to get the closest vector in the Google set to an outpout vector.

I work with python 3 and Tensorflow

Thanks a lot, feel free to ask any questions about the project

Charles

Upvotes: 0

Views: 265

Answers (1)

gojomo
gojomo

Reputation: 54153

The gensim method .most_similar() (on KeyedVectors & similar classes) will also accept raw vectors as the 'origin' from which to search.

Just be sure to explicitly name the positive parameter - a list of target words/vectors to combine to find the origin point.

For example:

gvecs = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz')
target_vec = gvecs['apple']
similars = gvecs.most_similar(positive=[target_vec,])

Upvotes: 1

Related Questions