NikhiR
NikhiR

Reputation: 79

Search for substring in Elastic Search Java

I am working with elastic search and am trying to look for a substring inside a field. For example - searching for the string tac in stack overflow . I am using the MultiMatchQuery for this but it does not work. Here is a snippet of my code (first_name is the field name).

searchString = "*" + searchString.toLowerCase() + "*";
MultiMatchQueryBuilder mqb = new MultiMatchQueryBuilder("irs", first_name);
mqb.type(MultiMatchQueryBuilder.Type.PHRASE);
BoolQueryBuilder searchQuery = boolQuery();
searchQuery.should(mqb);
NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder();
queryBuilder.withQuery(searchQuery);
NativeSearchQuery query = queryBuilder.build();

When I search for tac it does not return any results. When I search for stack or overflow it does return stack overflow.

So it looks for the exact string. I tried using MultiMatchQueryBuilder.Type.PHRASE_PREFIX but it looks for the phrases starting with the substring. It works with strings like stac or overf but not tac or tack.

Any suggestions on how to fix it?

Upvotes: 1

Views: 647

Answers (1)

Amit
Amit

Reputation: 32376

Macth query is analyzed and applied the same analyzer which is applied during the index time, I believe you are using the standard analyzer, which generated below tokens

POST http://localhost:9200/_analyze

{
    "text": "stack overflow",
    "analyzer" : "standard"
}

{
    "tokens": [
        {
            "token": "stack",
            "start_offset": 0,
            "end_offset": 5,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "overflow",
            "start_offset": 6,
            "end_offset": 14,
            "type": "<ALPHANUM>",
            "position": 1
        }
    ]
}

Hence searching for tac doesn't match any token in an index, you need to change the analyzer so that it matches the query time tokens to index time tokens.

n-gram tokenizer can solve the issue.

Example

Index mapping

{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "ngram",
          "min_gram": 1,
          "max_gram": 10
        }
      },
      "analyzer": {
        "autocomplete": { 
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    },
    "index.max_ngram_diff" : 10
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "autocomplete", 
        "search_analyzer": "standard" 
      }
    }
  }
}

Index sample doc

{
   "title" :  "stack overflow"
}

And search query

{
    "query": {
        "match": {
            "title": "tac"
        }
    }
}

And search result

"hits": [
            {
                "_index": "65241835",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.4739784,
                "_source": {
                    "title": "stack overflow"
                }
            }
        ]
    }

Upvotes: 1

Related Questions