Reputation: 603
I would like to use embeddings made by w2v in order to obtain the most likely substitute words GIVEN a context (surrounding words), rather than supplying an individual word.
Example: sentence = 'I would like to go to the park tomorrow after school'
If I want to find candidates similar to "park", typically I would just leverage the similarity function from the Gensim model
model.most_similar('park')
and obtain semantically similar words. However this could give me similar words to the verb 'park' instead of the noun 'park', which I was after.
Is there any way to query the model and give it surrounding words as context to provide better candidates?
Upvotes: 1
Views: 848
Reputation: 54173
Word2vec is not, primarily, a word-prediction algorithm. Internally it tries to do semi-predictions, to train its word-vectors, but usually these training-predictions aren't the end-use for which word-vectors are wanted.
That said, recent versions of gensim added a predict_output_word()
method that (for some model modes) approximates the predictions done during training. It might be useful for your purposes.
Alternatively, checking for the words most_similar()
to your initial target word that are also somewhat-similar to the context words might help.
There have been some research papers about ways to disambiguate multiple word senses (like 'to /park/ a car' versus 'walk in a /park/') during word-vector training, but I haven't seen them implemented in open source libraries.
Upvotes: 3