Reputation: 11
I tried to apply a custom english analyzer, as well as the standard english analyzer in elasticsearch. My aim is especially to use stemming. So let's say I have following words in my documents: covers, impression.
Now, if I search for e.g. cover or impressive or impressions, I get 0 results. Only if I search for the exact terms "covers" or "impression" I will hit results.
This are my settings in elasticsearch (according to this documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html):
{
"settings": {
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
}
},
"analyzer": {
"rebuilt_english": {
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_stemmer"
]
}
}
}
}
}
My mapping looks as follows:
"mapping": {
"_doc": {
"properties": {
"title": {"type": "text",
"analyzer": "rebuilt_english"},
"description: {"type": text"
"analyzer": "rebuilt_english"}
}
}
}
I also tried (according to a few different tutorials) to change the settings like this (I just add the changes here, not the full code again):
{
"settings": {
"analysis": {
"analyzer: "rebuilt_english" {
"type": "custom",
"filter": #and so on...
Do I miss something here? As far as I understand, I need to set the settings for a specific analyzer in "settings", give it a name and then use that name in "mapping" properties, so every item is analyzed according to the settings set above.
I also tried to not set any specific settings and just set the analyzer properties (in mapping) for each item like:
"title": {"type": "text",
"analyzer": "english"}
Which also doesn't work (even when using filters like stemming).
I really tried to find a solution for hours, but I can't get it to work. Help would be much appreciated. Thanks!
UPDATE
This is the code I used to create the index (my latest try, according to my description I also tried other ways to apply the method):
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"rebuilt_english": {
"type": "custom",
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english"
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
},
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_stemmer"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"title": { "type": "text",
"analyzer": "rebuilt_english"
},
"description": { "type": "text",
"analyzer": "rebuilt_english"}
}
}
}
}
}
Upvotes: 0
Views: 3123
Reputation: 381
This below analyzer would work, fix is while you have defined "tokenizer":"standard"
then don't define "type":"standard"
field
PUT /analyzers_test
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"my_stemmer",
"lowercase"
]
}
},
"filter": {
"my_stemmer": {
"type": "stemmer",
"name": "english"
}
}
}
}
}
Upvotes: 0
Reputation: 1949
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"english_stop": {
"type":"standard",
"stopwords": "_english_"
},
"my_analyzer": {
"type":"custom",
"tokenizer":"standard",
"filter":["my_stemmer"]
}
},
"filter": {
"my_stemmer":{
"type": "stemmer",
"language": "english"
}
}
}
}
}
POST /my_index/_analyze
{
"analyzer": "my_analyzer",
"text": "I'm in the mood for drinking semi-dry wine!"
}
I think this will help. Thanks.
Upvotes: 0
Reputation: 14097
Your issue was that you had your filter
key, where you have all your named filters was in wrong place. It was placed within analyzer
, but was supposed to be a sibling key to analyzer
.
So my bet is that the following config should work as expected:
{
"settings":{
"analysis":{
"filter":{
"english_stop":{
"type":"stop",
"stopwords":"_english"
},
"english_stemmer":{
"type":"stemmer",
"language":"english"
},
"english_possessive_stemmer":{
"type":"stemmer",
"language":"possessive_english"
}
},
"analyzer":{
"rebuilt_english":{
"type":"custom",
"tokenizer":"standard",
"filter":[
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_stemmer"
]
}
}
},
"mappings":{
"_doc":{
"properties":{
"title":{
"type":"text",
"analyzer":"rebuilt_english"
},
"description":{
"type":"text",
"analyzer":"rebuilt_english"
}
}
}
}
}
}
Upvotes: 0