Brian Litzinger
Brian Litzinger

Reputation: 5608

ElasticSearch _suggest queries are case sensitive. Want them to be case insensitive

I'm currently performing a search with this endpoint and request:

elasticserver.com/citysuggest/_suggest -d {
  "result": {
    "text": "Chicago",
    "completion": {
      "field": "autoCompleteName"
    }
}

This is my index mapping:

{
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 1,
        "index": {
            "mapper": {
                "dynamic": false
            }
        },
        "analysis": {
            "analyzer": {
                "str_search_analyzer": {
                    "tokenizer": "standard",
                    "filter": ["standard", "str_delimiter", "asciifolding", "porter_stem"]
                },
                "str_index_analyzer": {
                    "tokenizer": "standard",
                    "filter": ["standard", "str_delimiter", "asciifolding", "porter_stem"],
                    "char_filter": "html_strip"
                }
            },
            "filter": {
                "str_delimiter": {
                    "type": "word_delimiter",
                    "generate_word_parts": true,
                    "catenate_words": true,
                    "catenate_numbers": true,
                    "catenate_all": true,
                    "split_on_case_change": true,
                    "preserve_original": true,
                    "split_on_numerics": true,
                    "stem_english_possessive": true
                }
            }
        }
    },
    "mappings": {
        "city": {
            "_source": {
                "enabled": false
            },
            "dynamic": false,
            "properties": {
                "_all": {
                    "enabled": false
                },
                "autoCompleteName": {
                    "type": "completion",
                    "index_analyzer": "str_index_analyzer",
                    "search_analyzer": "str_search_analyzer"
                }
            }
        }
    }
}

When I search for "Chicago", it returns expected results because it finds a match for Chicago, however, when I search for "chicago" it does not return anything. I can't for the life of me figure out what I need to change to make the searching case-insensitive. If a user types "ChiCAgO" it should return my Chicago result, instead I get nothing back.

To test my analyzers I ran this:

elasticserver.com/citysuggest/_analyze?text=ChicaGo&pretty

and I get what looks like a properly tokenized value.

{
  "tokens": [
    {
      "token": "chicago",
      "start_offset": 0,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

Upvotes: 2

Views: 829

Answers (1)

Olly Cruickshank
Olly Cruickshank

Reputation: 6180

You just need to add the lowercase token filter to your analyser.

 "analysis": {
     "analyzer": {
         "str_search_analyzer": {
             "tokenizer": "standard",
             "filter": ["standard", "str_delimiter", "asciifolding", "porter_stem", "lowercase"]
         },
         "str_index_analyzer": {
             "tokenizer": "standard",
             "filter": ["standard", "str_delimiter", "asciifolding", "porter_stem", "lowercase"],
             "char_filter": "html_strip"
         }
     },
     "filter": {
         "str_delimiter": {
             "type": "word_delimiter",
             "generate_word_parts": true,
             "catenate_words": true,
             "catenate_numbers": true,
             "catenate_all": true,
             "split_on_case_change": true,
             "preserve_original": true,
             "split_on_numerics": true,
             "stem_english_possessive": true
         }
     }
 }

Your test case worked because you didn't specify the analyzer, try:

curl -XGET 'localhost:9200/citysuggest/_analyze?analyzer=str_index_analyzer&text=ChicaGo&pretty'

Upvotes: 2

Related Questions