A_xay Queuemar
A_xay Queuemar

Reputation: 61

How to handle alphanumeric combinations (like "Hotel101") in Elasticsearch query results with multi_match?

I am working with Elasticsearch and I have an index that contains entries with alphanumeric combinations like "Hotel101 fort". When I search using a query like "Hotel 101 fort", I am not getting the correct results, because Elasticsearch splits the query into individual tokens such as "Hotel", "101", and "fort", but the indexed document is tokenized as "Hotel101" and "fort".

My goal is to ensure that "Hotel101" in the document matches a query like "Hotel 101 fort" without explicitly specifying an analyzer in the query. Here's what I've tried so far:

I am using a multi_match query to search across multiple fields. I have defined a custom analyzer using ngram and synonym filters, but the issue persists.

What I've Tried: I tried using ngram for partial matching, but that doesn't help with the full tokenization of alphanumeric combinations. I also used synonym filters but then I would have to enter all possible alphanumeric instances into the synonym list which seems very inefficient.

How can I configure Elasticsearch to handle alphanumeric combinations like "Hotel101" and "Hotel 101" as the same token during both indexing and querying, without specifying an analyzer in every query?

Upvotes: 0

Views: 44

Answers (1)

Sagar Patel
Sagar Patel

Reputation: 5486

You need to use Word delimiter token filter with custom analyzer.

Below is sample configuration:

Index Mapping:

PUT test3
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom", 
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "word_delimiter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "analyzer": "my_custom_analyzer"
      }
    }
  }
}

Sample Document:

POST test3/_doc/1
{
  "title":"Hotel101 fort"
}

Sample Query:

POST test3/_search
{
  "query": {
    "match": {
      "title": "Hotel 101 fort"
    }
  }
}

Response:

{
  "hits": [
    {
      "_index": "test3",
      "_id": "1",
      "_score": 0.8630463,
      "_source": {
        "title": "Hotel101 fort"
      }
    }
  ]
}

Upvotes: 1

Related Questions