samsamara
samsamara

Reputation: 4750

clustering semantically related words from a list of words

I have a word list containing about 30000 unique words.
I would like to group this list of words based on how similar these words tend to be. Can I create a ontology tree using this list and with possibly with the help of WordNet?

So essentially what I want to do is aggregate these words in some meaningful way to reduce the size of the list.
What kind of techniques can I use to do this?

Upvotes: 0

Views: 1437

Answers (1)

Ian Mercer
Ian Mercer

Reputation: 39277

You could certainly use Wordnet to make a first step towards clustering these words according to their SYNSET. In addition to 'same meaning' and 'opposite meaning' Wordnet also includes 'part of' relationships. Following these relationships for the word 'beer' for example visits all of these containing synsets: Brew1, Alcohol1, Drug_of_abuse1, Drug1, Agent3, Substance7, Matter3, Physical_entity1, Entity1, Causal_agent1, Beverage1, Liquid1, Fluid1, Substance1, Part1, Relation1, Abstraction6, Food1.

But ... it will depend on what kind of words you have as to how many you will find in Wordnet. It doesn't include verb tenses and it doesn't have a very large or very modern set of proper nouns. If you 30,000 words are adjectives and nouns it should do quite well.

Upvotes: 1

Related Questions