InfiniteSnow
InfiniteSnow

Reputation: 41

State of the art word sense disambiguation on WordNet synsets

I am trying to perform a simple task: given a corpus, identify all words that are hyponyms of a certain synset (e.g., «find every mention of a "plant" or a "bird"»). In order to do that accurately, I need to do word sense disambiguation on a group of synsets for every word in my corpus.

I am trying to do it using state-of-the-art methods as available in the open source space.

If using a neural method, I would need a pretrained model.

I have tried the greedy approach that considers every single synset for every word. This isn't great; however, I find that using traditional techniques like lesk as provided by nltk in practice is even worse, as I get way too many false negatives.

I see that spaCy already contains a transformer based model which comes with POS tagging out of the box, but the WordNet integration is supplied by an external package and I can't seem to find any way to do WSD on it.

I could certainly paraphrase the disambiguation query:

Which of these description matches the word x in the sentence: "yyy":

  1. "x means aaa"
  2. "x means bbb"
  3. "x means ccc"

And feed it into an LLM, so I can't see any hard limit on why there shouldn't be a more straightforward way to do this using modern deep learning techniques. Is there some available model I am unable to find?

Upvotes: 1

Views: 105

Answers (0)

Related Questions