Reputation: 759
I am working on a project where i am using elasticsearch to analyze tweets.I am making a list of hot topics(people's interests) which they most frequently tweet about.
As the tweets most of the time have words which needs to be stemmed before they can be used as a list(interest) name.
Elasticsearch is good at searching from the database it handles the stopwords removal; stemming; and etc. in the background but i was wondering if there is a way i can get stemmed words out of my tweet from the elastic search api.
I think apache lucene can do this but i want to stick with elasticsearch.
Can anybody suggest me a way to achieve this in elastic search.
Thanks in advance!!!
Lets suppose we have 3 words eg. Playing, played, plays etc.
All these words are same(play) after stemming, so i want to increase the count of only play(stemmed word) here, not the individual counts of 3 non stemmed words.
Hope this example makes my purpose more clear.
Upvotes: 2
Views: 465
Reputation: 66
Using snowball analyser you can achieve this.
https://gist.github.com/jiren/7263138
Upvotes: 1
Reputation: 4492
How about using the Analyze API of Elasticsearch? http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html
Upvotes: 1