Hans Wassink
Hans Wassink

Reputation: 2577

Language Analyzer doesnt work find singular results

I have a bunch of categories with translations in my category field. I have defined language analyzers for the fields in my index so I can search for them. But it doesnt find the singular version of my words. wasmachine in titles.title-nl is singular of wasmachines but not found. What am I missing?

Demo document

    "_source" : {
      "google_id" : 2706,
      "titles" : [
        {
          "title-en" : "laundry appliances",
          "title-de" : "waschen & trocknen",
          "title-fr" : "appareils de blanchisserie",
          "title-nl" : "wasmachines"
        }
      ]
    }

Way I mapped them

PUT categories/_mapping/category
{
"dynamic": false,
"properties": {
"titles.title-nl": {
"type": "text",
"analyzer": "dutch"
},
"titles.title-en": {
    "type": "text",
      "analyzer": "english"
    },
    "titles.title-de": {
    "type": "text",
      "analyzer": "german"
    },
    "titles.title-fr": {
    "type": "text",
      "analyzer": "french"
    }
  }
}

The way I search for them

GET categories/_search
{
  "size": 4, 
  "query": {
    "multi_match": {
      "query": "wasmachines",
      "fields": ["titles.title-de","titles.title-en", "titles.title-fr", "titles.title-nl"]
    }
  }
}

Upvotes: 2

Views: 397

Answers (1)

leandrojmp
leandrojmp

Reputation: 7473

The problem is that the default dutch analyzer doesn't know how to stem the word wasmachines, you will need to recreate your index with a custom analyzer using a stemmer_override.

Looking in the elastic documentation you can do the following to recreate the dutch analyzer and tell that wasmachines should be stemmed to wasmachine, just put wasmachine => wasmachines inside the rules for the stemmer_override

PUT categories/
{
  "settings": {
    "analysis": {
      "filter": {
        "dutch_stop": {
          "type":       "stop",
          "stopwords":  "_dutch_" 
        },
        "dutch_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["voorbeeld"] 
        },
        "dutch_stemmer": {
          "type":       "stemmer",
          "language":   "dutch"
        },
        "dutch_override": {
          "type":       "stemmer_override",
          "rules": [
            "fiets=>fiets",
            "bromfiets=>bromfiets",
            "wasmachine=>wasmachines",
            "ei=>eier",
            "kind=>kinder"
          ]
        }
      },
      "analyzer": {
        "rebuilt_dutch": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "dutch_stop",
            "dutch_keywords",
            "dutch_override",
            "dutch_stemmer"
          ]
        }
      }
    }
  }
}

You will also need to use that new analyzer in your mapping:

PUT categories/_mapping/category
{
    "dynamic": false,
    "properties": {
        "titles.title-nl": {
            "type": "text",
            "analyzer": "rebuilt_dutch"
        },
        "titles.title-en": {
            "type": "text",
            "analyzer": "english"
        },
        "titles.title-de": {
            "type": "text",
            "analyzer": "german"
        },
        "titles.title-fr": {
            "type": "text",
            "analyzer": "french"
        }
    } 
}

After that you will be able to search for wasmachine and get the documents that have wasmachines.

Upvotes: 3

Related Questions