elasticsearch query string dont search by word part

Question

I'm sending this request

curl -XGET 'host/process_test_3/14/_search' -d '{
  "query" : {
    "query_string" : {
      "query" : "\"*cor interface*\"",
      "fields" : ["title", "obj_id"]
    }
  }
}'

And I'm getting correct result

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 5.421598,
    "hits": [
      {
        "_index": "process_test_3",
        "_type": "14",
        "_id": "141_dashboard_14",
        "_score": 5.421598,
        "_source": {
          "obj_type": "dashboard",
          "obj_id": "141",
          "title": "Cor Interface Monitoring"
        }
      }
    ]
  }
}

But when I want to search by word part, as example

curl -XGET 'host/process_test_3/14/_search' -d '
{
  "query" : {
    "query_string" : {
      "query" : "\"*cor inter*\"",
      "fields" : ["title", "obj_id"]
    }
  }
}'

I'm getting no results back:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : []
  }
}

What am I doing wrong?

Val · Accepted Answer

This is because your title field has probably been analyzed by the standard analyzer (default setting) and the title Cor Interface Monitoring has been tokenized as the three tokens cor, interface and monitoring.

In order to search any substring of words, you need to create a custom analyzer which leverages the ngram token filter in order to also index all substrings of each of your tokens.

You can create your index like this:

curl -XPUT localhost:9200/process_test_3 -d '{
  "settings": {
    "analysis": {
      "analyzer": {
        "substring_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase", "substring"]
        }
      },
      "filter": {
        "substring": {
          "type": "nGram",
          "min_gram": 2,
          "max_gram": 15
        }
      }
    }
  },
  "mappings": {
    "14": {
      "properties": {
        "title": {
          "type": "string",
          "analyzer": "substring_analyzer"
        }
      }
    }
  }
}'

Then you can reindex your data. What this will do is that the title Cor Interface Monitoring will now be tokenized as:

co, cor, or
in, int, inte, inter, interf, etc
mo, mon, moni, etc

so that your second search query will now return the document you expect because the tokens cor and inter will now match.

elasticsearch query string dont search by word part

Answers (2)

Related Questions