Anuj
Anuj

Reputation: 7067

Unsupervised Clustering of Words in a document semantically

I want to cluster words based on their semantic similarity. Currently I have a list of documents with detected noun phrases in them. I want to make cluster out of these obtained nouns within the documents and unsupervisedly cluster them semantically?

I have looked at wordnet and gensim libraries. Any suggestions as to which can really help in getting the required cluster of words based on their semantic similarity?

Upvotes: 2

Views: 1995

Answers (1)

Radim
Radim

Reputation: 4266

For similarity based on phrase co-occurrence (phrases appearing more often together in documents will be more similar), you can use gensim.

Check out the Latent Semantic Analysis and Latent Dirichlet Allocation there: http://radimrehurek.com/gensim/tut2.html#available-transformations

Depending on what exactly you want your clusters to do, you can either use the LSI/LDA topics directly as clusters. Or cluster the obtained latent phrase vectors etc.

Upvotes: 1

Related Questions