Reputation: 7067
I want to cluster words based on their semantic similarity. Currently I have a list of documents with detected noun phrases in them. I want to make cluster out of these obtained nouns within the documents and unsupervisedly cluster them semantically?
I have looked at wordnet and gensim libraries. Any suggestions as to which can really help in getting the required cluster of words based on their semantic similarity?
Upvotes: 2
Views: 1995
Reputation: 4266
For similarity based on phrase co-occurrence (phrases appearing more often together in documents will be more similar), you can use gensim.
Check out the Latent Semantic Analysis and Latent Dirichlet Allocation there: http://radimrehurek.com/gensim/tut2.html#available-transformations
Depending on what exactly you want your clusters to do, you can either use the LSI/LDA topics directly as clusters. Or cluster the obtained latent phrase vectors etc.
Upvotes: 1