joasa
joasa

Reputation: 976

Python make clusters of synonyms

I have a long list of words :

verbs = ['be','have', 'find', 'use', 'show', 'increase', 'detect', 'do', 'determine', 'demonstrate', 'observe','suggest', ...]

And I want to make clusters from these words based on which ones are synonyms (semantically close). I want to compare each element of the list with all the rest and for those that have a similarity score > 0.7 , group them together. I am using wordnet but I keep getting this error:

for i, verb in enumerate(verbs):
    for j in range(i + 1, len(verbs)):
        verbs[i].wup_similarity(verbs[j])


    ERROR MESSAGE : 
    ---->        verbs[i].wup_similarity(verbs[j])
    ---->        AttributeError: 'str' object has no attribute 'wup_similarity'

Maybe that's not even the right approach, but can anyone help?

Upvotes: 2

Views: 459

Answers (1)

Blupper
Blupper

Reputation: 398

Regarding the updated question, this solution works on my machine.

verbs = ['be','have', 'find', 'use', 'show', 'increase', 'detect', 'do', 'determine', 'demonstrate', 'observe','suggest']

for i, verb in enumerate(verbs):
    for j in range(i + 1, len(verbs)):
        v1 = wordnet.synset(verbs[i]+ '.v.01')
        v2 = wordnet.synset(verbs[j]+ '.v.01')
        wup_score = v1.wup_similarity(v2)
        if wup_score > 0.7:
            print(f"{verbs[i]} and {verbs[j]} are similar")
            #or do whatever you want to do with similar words.

Regarding the original question:

I'am no expert in this, so maybe this does not help at all. Currently you do str.wup_similarity(str). However according to this documentation (search for 'wup_similarity' on that website) I think it should be synset1.wup_similarity(synset2).

So my proposal would be to do:

for i, verb in enumerate(verbs):
    for j in range(i + 1, len(verbs)):
        for syni in wordnet.synsets(verb[i]):
            for synj in wordnet.synsets(verb[j]):
                for li in syni.lemmas():
                    for lj in synj.lemmas():
                        v1 = wordnet.synset(verbs[i]+ '.v.01')
                        v2 = wordnet.synset(verbs[j]+ '.v.01')
                        v1.wup_similarity(v2)

Upvotes: 2

Related Questions