ditojanelidze
ditojanelidze

Reputation: 90

Find documents that match whole query in elasticsearch

I want to write query in ElasticSearch that provides results which contains all the words in search query but not only as complete word , but also as subword. For example, if I have Document with this values:

{
"first_name":"didier",
"last_name":"drogba"
}

and i search for "didi dro", this document should be returned. if i search for "david drogba", document should be ignored because it doesn't contain word "david" even as subword. I tried it using ngram tokenizer but couldn't achive what i want.

Index i created

PUT doctors
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "ngram"
        }
      }
    }
  }
}

then added mapping

put doctors/_doc/_mapping 
{
  "properties": {
    "first_name": {
      "type": "text",
      "analyzer": "my_analyzer"
    },
    "last_name": {
      "type": "text",
      "analyzer": "my_analyzer"
    }
  }
}

add some document

post doctors/_doc/1
{
  "first_name": "dito",
  "last_name": "janelidze",
  "specialism": "oftalmologist",
  "location_name":"evex saburtalo clinic",
  "brand": "Evex",
  "address":"kavtaradze street N21"
}

and my search query looks like this

get doctors/_doc/_search
{
  "query": {
    "multi_match": {
        "query": "david jane",
        "fields": ["first_name", "last_name"]
    }
  }
}

it gives me document which i inserted but i dont need it because it doesn't contain word "david"

Upvotes: 0

Views: 81

Answers (2)

Kamal Kunjapur
Kamal Kunjapur

Reputation: 8860

Point 1: Mapping Change

N-Gram tokenizer would construct words of specified length from input words. This length is specified as the min_gram and max_gram in the mapping which if you don't specify would defaults to 1 and 2 respectively.

I've update the mapping you've provided with min_gram:3 and max_gram:5 respectively.

The N-Gram Tokenizer would then create the tokens, say for e.g. for didier they would be did, idi, die, ier, didi, idie, dier, didie, idier, which are eventually stored in inverted index.

With default 1 and 2 as min_gram and max_gram respectively, notice that didier and david would have id as common sub word, that's why they are returned.


Mapping

PUT doctors
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 6,
        }
      }
    }
  }
}

Point 2: Query Change

That said, despite making the mapping change, if your query string has david jane using what you have, it would search for david or jane in first_name OR last_name. That means that document dito janelidze would still be returned (but will have lower score than one having david jane)

Using operator AND would make the search as david AND jane in first_name OR in last_name which isn't what you are looking for.

Instead what you can do is make use of below bool query or create another field called name, copy the values of first_name and last_name to it using copy_to field and use that field to search.


Query

POST <your_index_name>/_search
{
  "query": {
    "bool":{
      "must": [
        {
          "match": {
            "first_name": "david"
          }
        },
        {
          "match": {
            "last_name": "jane"
          }
        }
      ]
    }
  }
}

Unfortunately you would need to delete, recreate the index and ingest the documents again as the changes required are at mapping level.

Hope this helps!

Upvotes: 1

LeBigCat
LeBigCat

Reputation: 1770

+1 for the operator "and" for each words.Use this, work for me (could be use for autocomplete too).

settings:
    analysis": {
          "filter": {
            "name_ngrams": {
              "max_gram": "20",
              "type": "edgeNGram",
              "min_gram": "1",
              "side": "front"
            }
          },
          "analyzer": {
            "partial_name": {
              "type": "custom",
              "filter": [
                "lowercase",
                "name_ngrams",
                "standard",
                "asciifolding"
              ],
              "tokenizer": "standard"
            },
            "full_name": {
              "type": "custom",
              "filter": [
                "standard",
                "lowercase",
                "asciifolding"
              ],
              "tokenizer": "standard"
            }
          }


mapping:

    "first_name": {

        "type": "text",
        "index_analyzer": "partial_name",
        "search_analyzer": "full_name"

    },
    "last_name": {

        "type": "text",
        "index_analyzer": "partial_name",
        "search_analyzer": "full_name"

    },

Upvotes: 1

Related Questions