Reputation: 4750
I have a word list containing about 30000 unique words.
I would like to group this list of words based on how similar these words tend to be.
Can I create a ontology tree using this list and with possibly with the help of WordNet?
So essentially what I want to do is aggregate these words in some meaningful way to reduce the size of the list.
What kind of techniques can I use to do this?
Upvotes: 0
Views: 1437
Reputation: 39277
You could certainly use Wordnet to make a first step towards clustering these words according to their SYNSET. In addition to 'same meaning' and 'opposite meaning' Wordnet also includes 'part of' relationships. Following these relationships for the word 'beer' for example visits all of these containing synsets: Brew1, Alcohol1, Drug_of_abuse1, Drug1, Agent3, Substance7, Matter3, Physical_entity1, Entity1, Causal_agent1, Beverage1, Liquid1, Fluid1, Substance1, Part1, Relation1, Abstraction6, Food1.
But ... it will depend on what kind of words you have as to how many you will find in Wordnet. It doesn't include verb tenses and it doesn't have a very large or very modern set of proper nouns. If you 30,000 words are adjectives and nouns it should do quite well.
Upvotes: 1