robinhood91
robinhood91

Reputation: 1691

What's best WordNet function for similarity between words?

I aim to find the similarities between words for about ~10,000 words. I'm using the "word.path_similarity(otherword)" method of the wordnet library but the results I'm getting for the path_similarity are in the range 0-0.1 as opposed to being distributed over 0-1. How is it possible that similarities between 10,000 random words all end up in that narrow range?

Is there a better way to use WordNet for finding similarity between two words?

Upvotes: 0

Views: 900

Answers (1)

David Marx
David Marx

Reputation: 8558

For context, here's how this is calculated:

  1. Claculate the length of the shortest path between the two synsets/words (inclusive).

  2. Return the score as 1/pathlen

Therefore a score <.2 is indicative of a pathlength > 5 steps. Inclusive of the two input synsets, that means there are at least 4 synsets between them.

With that said: you're complaint seems to be "according to this metric, two words chosen at random are pretty consistently unrelated! What's going on?" Well, your similarity metric is telling you that random words are generally not closely related. This shouldn't be that surprising. Why are you calculating similarities between random words to begin with?

Upvotes: 3

Related Questions