blacksoul
blacksoul

Reputation: 43

Elasticsearch match query does not match a document with apostrophe

I'm building a searcher for a localities autocomplete, a simpler version of Google Maps one. Everything seemed to be working ok with the query I was using:

{
  "query": {
    "bool": {
      "must": {
        "multi_match": {
          "query": "Ametlla",
          "type": "best_fields",
          "fields": [
            "locality",
            "alternative_names"
          ],
          "operator": "and"
        }
      },
      "filter": {
        "term": {
          "country_code": "ES"
        }
      }
    }
  }
}

The issue I discovered is related to a city from Spain: L'Ametlla de Mar.

/localities_index/localities/10088

{
  "_index": "localities_index",
  "_type": "localities",
  "_id": "10088",
  "_version": 1,
  "_seq_no": 133,
  "_primary_term": 4,
  "found": true,
  "_source": {
    "country_code": "es",
    "locality": "L'Ametlla de Mar",
    "alternative_names": []
  }
}

You can search for Ametlla and it's matched (see following partial name example query)

{
    "query": {
        "match": {
            "locality": {
                "query" : "Ametlla"
            }
        }
    }
}

/localities_index/localities/10088/_explain

{
  "_index": "localities_index",
  "_type": "localities",
  "_id": "10088",
  "matched": true,
  "explanation": {
    "value": 3.3985975,
    "description": "weight(locality:ametlla in 2) [PerFieldSimilarity], result of:",
    "details": [
      {
        "value": 3.3985975,
        "description": "score(freq=1.0), product of:",
        "details": [
          {
            "value": 2.2,
            "description": "boost",
            "details": []
          },
          {
            "value": 3.6686769,
            "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
            "details": [
              {
                "value": 2,
                "description": "n, number of documents containing term",
                "details": []
              },
              {
                "value": 97,
                "description": "N, total number of documents with field",
                "details": []
              }
            ]
          },
          {
            "value": 0.4210829,
            "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
            "details": [
              {
                "value": 1.0,
                "description": "freq, occurrences of term within document",
                "details": []
              },
              {
                "value": 1.2,
                "description": "k1, term saturation parameter",
                "details": []
              },
              {
                "value": 0.75,
                "description": "b, length normalization parameter",
                "details": []
              },
              {
                "value": 9.0,
                "description": "dl, length of field",
                "details": []
              },
              {
                "value": 7.5360823,
                "description": "avgdl, average length of field",
                "details": []
              }
            ]
          }
        ]
      }
    ]
  }
}

but if you use its full name it is not.

I've tried adding punctuation to token_chars, as I saw at https://stackoverflow.com/a/49362505 but it didn't work. So I tried adding ' as custom_token_chars and it didn't work either. /localities_index/_settings

{
  "localities_index": {
    "settings": {
      "index": {
        "number_of_shards": "1",
        "provided_name": "localities_index",
        "creation_date": "1596537683568",
        "analysis": {
          "analyzer": {
            "autocomplete": {
              "filter": [
                "lowercase",
                "asciifolding"
              ],
              "tokenizer": "autocomplete"
            },
            "autocomplete_search": {
              "filter": [
                "lowercase",
                "asciifolding"
              ],
              "tokenizer": "lowercase"
            }
          },
          "tokenizer": {
            "autocomplete": {
              "token_chars": [
                "letter",
                "digit"
              ],
              "custom_token_chars": "'",
              "min_gram": "1",
              "type": "edge_ngram",
              "max_gram": "15"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "lS3Ork2zSySYJbJYmx29aw",
        "version": {
          "created": "7040099"
        }
      }
    }
  }
}

/localities_index/_mapping

{
  "localities_index": {
    "mappings": {
      "properties": {
        "alternative_names": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "country_code": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "locality": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

Upvotes: 0

Views: 883

Answers (1)

Amit
Amit

Reputation: 32386

You can use the Apostrophe token filter in your custom analyzer and use that on your field(locality which contains them) and use match query which you are already using as it will use the same analyzer which is used at index time and you will get the expected result.

Upvotes: 1

Related Questions