Reputation: 5901
I am running term statistics in elasticsearch and I get the result:
"tevez's": {
"doc_freq": 165,
"ttf": 245,
"term_freq": 1,
"tokens": [
{
"position": 722,
"start_offset": 4077,
"end_offset": 4084
}
],
"score": 9.041515
How can I tell elasticsearch to consider tevez's
and tevez
to be the same?
I also get:
"benched": {
"doc_freq": 130,
"ttf": 140,
"term_freq": 1,
"tokens": [
{
"position": 757,
"start_offset": 4292,
"end_offset": 4299
}
],
"score": 9.278306
How can I tell elasticsearch to consider benched
and bench
to be the same?
Upvotes: 0
Views: 88
Reputation: 1345
possessive_english
to remove 's
porter
or other stemmer to remove tenses and something elseFor english, here's a full list of stemmers.
Also, you need to create the settings like:
{
"settings": {
"index": {
"analysis": {
"filter": {
"possessive": {
"type": "stemmer",
"language": "possessive_english"
},
"porter": {
"type": "stemmer",
"language": "english"
}
},
"analyzer": {
"custom_english": {
"tokenizer": "standard",
"filter": [
"lowercase",
"porter",
"possessive"
]
}
}
}
}
}
}
Finally request $endpoint/$index/_analyze?analyzer=persian_keyword_analyzer&text=$text
to view the stem effect.
Upvotes: 1