Reputation: 13
I have been trying to create distractors (false answers) for multiple choice questions. Using word vectors, I was able to get decent results for single-word nouns.
When dealing with compound nouns (such as "car park" or "Donald Trump"), my best attempt was to compute similar words for each part of the compound and combine them. The results are very entertaining:
Unfortunately, these are not very convincing. Especially in case of named entities, I want to produce other named entities, which appear in similar contexts; e.g:
Does anyone have any suggestions of how this could be achieved? Is there are a way to generate similar named entities/noun chunks?
I managed to get some good distractors by searching for the named entities on Wikipedia, then extracting entities which are similar from the summary. Though I'd prefer to find a solution using just spacy.
Upvotes: 1
Views: 334
Reputation: 7105
If you haven't seen it yet, you might want to check out sense2vec
, which allows learning context-sensitive vectors by including the part-of-speech tags or entity labels. Quick usage example of the spaCy extension:
s2v = Sense2VecComponent('/path/to/reddit_vectors-1.1.0')
nlp.add_pipe(s2v)
doc = nlp(u"A sentence about natural language processing.")
most_similar = doc[3]._.s2v_most_similar(3)
# [(('natural language processing', 'NOUN'), 1.0),
# (('machine learning', 'NOUN'), 0.8986966609954834),
# (('computer vision', 'NOUN'), 0.8636297583580017)]
See here for the interactive demo using a sense2vec model trained on Reddit comments. Using this model, "car park" returns things like "parking lot" and "parking garage", and "Donald Trump" gives you "Sarah Palin", "Mitt Romney" and "Barack Obama". For ambiguous entities, you can also include the entity label – for example, "Niagara Falls|GPE" will show similar terms to the geopolitical entitiy (GPE
), e.g. the city as opposed to the actual waterfalls. The results obviously depend on what was present in the data, so for even more specific similarities, you could also experiment with training your own sense2vec vectors.
Upvotes: 0