Reputation: 41
I am trying to perform a simple task: given a corpus, identify all words that are hyponyms of a certain synset (e.g., «find every mention of a "plant" or a "bird"»). In order to do that accurately, I need to do word sense disambiguation on a group of synsets for every word in my corpus.
I am trying to do it using state-of-the-art methods as available in the open source space.
If using a neural method, I would need a pretrained model.
I have tried the greedy approach that considers every single synset for every word. This isn't great; however, I find that using traditional techniques like lesk
as provided by nltk
in practice is even worse, as I get way too many false negatives.
I see that spaCy already contains a transformer based model which comes with POS tagging out of the box, but the WordNet integration is supplied by an external package and I can't seem to find any way to do WSD on it.
I could certainly paraphrase the disambiguation query:
Which of these description matches the word x in the sentence: "yyy":
- "x means aaa"
- "x means bbb"
- "x means ccc"
And feed it into an LLM, so I can't see any hard limit on why there shouldn't be a more straightforward way to do this using modern deep learning techniques. Is there some available model I am unable to find?
Upvotes: 1
Views: 105