Reputation: 1
I am currently implementing a natural text generator for a school project. I have a dataset of sentences of predetermined lenght and key words, I convert them in vectors thanks to gensim and GoogleNews-vectors-negative300.bin.gz. I train a recurrent neural network to create a list of vectors that I compare to the list of vectors of the real sentence. So I try to get as close as possible to the "real" vectors.
My problem happens when I have to convert back vectors into words: my vectors aren't necessarily in the google set. So I would like to know if there is an efficient solution to get the closest vector in the Google set to an outpout vector.
I work with python 3 and Tensorflow
Thanks a lot, feel free to ask any questions about the project
Charles
Upvotes: 0
Views: 265
Reputation: 54153
The gensim
method .most_similar()
(on KeyedVectors
& similar classes) will also accept raw vectors as the 'origin' from which to search.
Just be sure to explicitly name the positive
parameter - a list of target words/vectors to combine to find the origin point.
For example:
gvecs = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz')
target_vec = gvecs['apple']
similars = gvecs.most_similar(positive=[target_vec,])
Upvotes: 1