CrystalCase
CrystalCase

Reputation: 187

Elastic returns unexpected result from Search using edge_ngram

I am working out how to store my data in elasticsearch. First I tried the fuzzy function and while that worked okay I did not receive the expected results. Afterwards I tried the ngram and then the edge_ngram tokenizer. The edge_ngram tokenizer looked like it works like an autocomplete. Exactly what I needed. But it still gives unexpected results. I configured min 1 and max 5 to get all results starting with the first letter I search for. While this works I still get those results as I continue typing.

Example: I have a name field filled with documents named The New York Times and The Guardian. Now when I search for T both occur as expected. But the same happens when I search for TT, TTT and so on.

In that case it does not matter wether I execute the search in Kibana or from my application (which useses MultiMatch on all fields). Kibana even shows me the that it matched the single letter T.

So what did I miss and how can I achieve getting the results like with an autocomplete but without having too many results?

Upvotes: 0

Views: 53

Answers (1)

Bhavya
Bhavya

Reputation: 16192

When defining your index mapping, you need to specify search_analyzer as standard. If no search_analyzer is defined explicitly, then by default elasticsearch considers search_analyzer to be the same as that of analyzer specified.

Adding a working example with index data, mapping, search query and search result

Index Mapping:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": [
            "lowercase"
          ]
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 5,
          "token_chars": [
            "letter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "autocomplete",
        "search_analyzer": "standard"      // note this
      }
    }
  }
}

Index Data:

{
    "name":"The Guardian"
}
{
    "name":"The New York Times"
}

Search Query:

{
  "query": {
    "match": {
      "name": "T"
    }
  }
}

Search Result:

"hits": [
      {
        "_index": "69027911",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.23092544,
        "_source": {
          "name": "The New York Times"
        }
      },
      {
        "_index": "69027911",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.20824991,
        "_source": {
          "name": "The Guardian"
        }
      }
    ]

Upvotes: 1

Related Questions