Reputation: 11420
I am new to elasticsearch. I want to create a custom analyzer in elasticsearch, with custom filters and custom stemmers. I know that ElasticSearch is built upon lucene, and in lucene, custom stemmer support is there. But, I am not able to find any example, which shows custom analyzer/stemmer implementation in lucene and integration of the same in elasticsearch.
Apologizing for bad english. Thanks in advance.
Edit 1
What I want is Hinglish Stemmer, which will transform following inputs to given below outputs:-
Upvotes: 2
Views: 1479
Reputation: 11420
Finally, after several hiccups, I was finally able to create implementation of hinglish-stemmer. It's available at following link :-
https://github.com/Mangu-Singh-Rajpurohit/hinglish-stemmer/
Upvotes: 3
Reputation: 4818
I will try to write a simple answer, let me know if you have any question.
First step: Create the custom_stemming file (here "custom_stems.txt"), like the following, and place it into the config folder (I put it under "config/analysis/custom_stems.txt"):
rama => ram
raam => ram
sachin => sachin
sacheen => sachin
sachina => sachin
sacheena => sachin
kuldeep => kuldip
kooldeep => kuldip
kooldipa => kuldip
Create then an index with an adequate mapping (I use the mapping from this example, you can create other analyzer, the only important part here is the "custom_stems" stemmer):
PUT /my_index
{
"settings": {
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "standard",
"filter" : ["lowercase", "custom_stems"]
}
},
"filter" : {
"custom_stems" : {
"type" : "stemmer_override",
"rules_path" : "analysis/custom_stems.txt"
}
}
}
}
}
Test that it works:
GET /my_index/_analyze
{
"text": ["Rama"],
"analyzer": "my_analyzer"
}
You should see in the output:
{
"tokens": [
{
"token": "ram",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
}
]
}
Note that i used:
Upvotes: 1