Reputation: 1691
I aim to find the similarities between words for about ~10,000 words. I'm using the "word.path_similarity(otherword)" method of the wordnet library but the results I'm getting for the path_similarity are in the range 0-0.1 as opposed to being distributed over 0-1. How is it possible that similarities between 10,000 random words all end up in that narrow range?
Is there a better way to use WordNet for finding similarity between two words?
Upvotes: 0
Views: 900
Reputation: 8558
For context, here's how this is calculated:
Claculate the length of the shortest path between the two synsets/words (inclusive).
Return the score as 1/pathlen
Therefore a score <.2 is indicative of a pathlength > 5 steps. Inclusive of the two input synsets, that means there are at least 4 synsets between them.
With that said: you're complaint seems to be "according to this metric, two words chosen at random are pretty consistently unrelated! What's going on?" Well, your similarity metric is telling you that random words are generally not closely related. This shouldn't be that surprising. Why are you calculating similarities between random words to begin with?
Upvotes: 3