Reputation: 9
I am looking for advice on how to find clusters of terms that are all related to a single concept.
The goal is to improve a tag or keyword search for images that describe concepts or processes or situations. An image may describe a brainstorming session, or a particular theme. These images which are meant to be used in PowerPoint or other presentation material have user contributed tags.
The issue is our tag based search may bring back completely unrelated images. Our goal is to find the clusters within the tags in order to refine the tags related to a central concept and remove the outliers that are not related to the clusters.
For example if you have a you had the tags meeting, planning, brainstorming, and round table. Ideally we would want to remove round table from the cluster as it doesn't fit the theme.
I have worked with WordNet Similarity but the results are quite strange. I was wondering if there are any other tools in python's NLTK that could help me solve this.
Thanks!
Upvotes: 0
Views: 717
Reputation: 7806
Your question is based in the area called "topic modeling" you can use: gensim https://radimrehurek.com/gensim/ or lda https://pypi.python.org/pypi/lda
Upvotes: 1