fast_cen
fast_cen

Reputation: 1377

Elasticsearch ngram query doesn't work

I moved from elasticsearch 2.0 to 5.2 and ngram search is now broken!

The elasticsearch setup is just below, it's just a simple ngram tokenizer for title and summary fields.

settings = {
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0,
        "analysis": {
            "filter": {
                "ngram_filter": {
                    "type": "nGram",
                    "min_gram": 3,
                    "max_gram": 10
                }
            },
            "analyzer": {
                "search_ngram_analyzer": {
                    "tokenizer": "standard",
                    "type": "custom",
                    "filter": ["standard", "lowercase", "stop", "asciifolding"]
                },

                "index_ngram_analyzer": {
                    "tokenizer": "standard",
                    "type": "custom",
                    "filter": ["standard", "lowercase", "stop", "asciifolding", "ngram_filter"]
                }
            }
        },

    },
    "mappings": {
        "docs": {
            "properties": {
                'title': {
                    'boost': 100.0,
                    'search_analyzer': 'search_ngram_analyzer',
                    'analyzer': 'index_ngram_analyzer',
                    'type': 'text',
                },
                'summary': {
                    'boost': 20.0,
                    'search_analyzer': 'search_ngram_analyzer',
                    'analyzer': 'index_ngram_analyzer',
                    'type': 'text',
                }
            }
        }

    }
}

http://localhost:9200/my_index/_search?q=example return the document with word "example" in it. As a normal query.

However, http://localhost:9200/my_index/_search?q=exampl (with the "e" for example) return an empty object!

I don't find the error in my setup. Is this an API break?

Upvotes: 0

Views: 1733

Answers (1)

xeraa
xeraa

Reputation: 10859

Are you sure this has worked in previous versions?

If you use the URI search and don't specify the field (like you do in http://localhost:9200/my_index/_search?q=exampl), then the _all field will be used. That uses the standard analyzer, so there are no ngrams. The query you want to use is /my_index/_search?q=title:exampl

For the sake of reproducibility, here is the dump of the entire example for Console:

PUT /my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "ngram_filter": {
          "type": "nGram",
          "min_gram": 3,
          "max_gram": 10
        }
      },
      "analyzer": {
        "search_ngram_analyzer": {
          "tokenizer": "standard",
          "type": "custom",
          "filter": [
            "standard",
            "lowercase",
            "stop",
            "asciifolding"
          ]
        },
        "index_ngram_analyzer": {
          "tokenizer": "standard",
          "type": "custom",
          "filter": [
            "standard",
            "lowercase",
            "stop",
            "asciifolding",
            "ngram_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "docs": {
      "properties": {
        "title": {
          "boost": 100,
          "search_analyzer": "search_ngram_analyzer",
          "analyzer": "index_ngram_analyzer",
          "type": "text"
        },
        "summary": {
          "boost": 20,
          "search_analyzer": "search_ngram_analyzer",
          "analyzer": "index_ngram_analyzer",
          "type": "text"
        }
      }
    }
  }
}


GET /my_index/_analyze
{
  "analyzer": "index_ngram_analyzer",
  "text": "example exampl"
}
GET /my_index/_analyze
{
  "analyzer": "search_ngram_analyzer",
  "text": "example exampl"
}

POST /my_index/docs
{
  "title": "This is an example",
  "summary": "Some more text"
}

GET /my_index/_search?q=example
GET /my_index/_search?q=exampl
GET /my_index/_search?q=title:exampl

DELETE /my_index

Upvotes: 4

Related Questions