Sandy
Sandy

Reputation: 107

Elasticsearch Giving Incorrect Result When Using "must_not" operator alongwith "must" Operator

Query Example -

    GET /beta/user/_search?routing=1&q=_id:54e5dc4817cf03cbbbe490e5
{
   "from":0,
   "size":1,
   "filter":{
      "and":[
         {
            "query":{
               "nested":{
                  "path":"event",
                  "query":{
                     "bool":{
                        "must":[
                           {
                              "match":{
                                 "event.name":"e1"
                              }
                           },
                           {
                              "match":{
                                 "event.count":"4"
                              }
                           }
                        ]
                     }
                  }
               }
            }
         },
         {
            "query":{
               "nested":{
                  "path":"event",
                  "query":{
                     "bool":{
                        "must_not":{
                           "match":{
                              "event.name":"e2"
                           }
                        }
                     }
                  }
               }
            }
         }
      ]
   }
}

I am having problems with the above query when I tried to use "must_not" and "must" operator in a single query. Any help will be highly appreciated.

The above query has an event called "e1" where I'm trying to use must operator and also "must_not" operator with event called "e2".

Upvotes: 0

Views: 145

Answers (1)

Zach
Zach

Reputation: 9731

This isn't possible with just a nested query (or nested filter). The problem is that documents are evaluated one doc at a time. Internally, Elasticsearch stores nested documents as independent Lucene documents. The root object becomes one Lucene doc, and each subsequent nested doc becomes its own Lucene doc.

This is how they maintain their relationship between fields without interacting with other nested documents. More details here

When a query is being evaluated, it iterates over each nested document one-by-one. The query can only "see" the values within that single nested doc. This means it only knows about one set of event.name and event.count at a time, and is unable to match a must against nested doc #1 and a must_not against nested doc #2.

The workaround is to denormalize the nested data back into the root object. This will allow you to check the denormalized "bag of values" for the terms you must and must_not include. For example:

Create a new index...notice that we add "include_in_root" for the nested mapping

PUT /nestedtest/
{
    "mappings": {
        "test" : {
            "properties" : {
                "event" : {
                    "type" : "nested",
                    "include_in_root":true,
                    "properties": {
                        "name" : {"type": "string" },
                        "count"  : {"type": "integer" }
                    }
                }
            }
        }
    }
}

Index some docs:

POST /nestedtest/test/
{
    "event": [
        {
            "name": "e1",
            "count": 1
        },
        {
            "name": "e2",
            "count": 2
        }
    ]
}

POST /nestedtest/test/
{
    "event": [
        {
            "name": "e1",
            "count": 1
        },
        {
            "name": "e3",
            "count": 3
        }
    ]
}

Now execute a search. This query is a bool which contains two must clauses:

  • First must is the nested query. This checks to make sure the nested doc has correct name and count
  • Second must is a bool which makes sure at least one nested doc has name: e1 and that no docs have name: e2

The final query looks like:

GET /nestedtest/test/_search
{
   "query": {
      "bool": {
         "must": [
            {
               "nested": {
                  "path": "event",
                  "query": {
                     "bool": {
                        "must": [
                           {
                              "match": {
                                 "event.name": "e1"
                              }
                           },
                           {
                              "match": {
                                 "event.count": "1"
                              }
                           }
                        ]
                     }
                  }
               }
            },
            {
               "bool": {
                  "must": [
                     {
                        "match": {
                           "event.name": "e1"
                        }
                     }
                  ],
                  "must_not": [
                     {
                        "match": {
                           "event.name": "e2"
                        }
                     }
                  ]
               }
            }
         ]
      }
   }
}

And it returns just the doc we are interested in:

{
   "took": 2,
   "timed_out": false,
   "_shards": {...},
   "hits": {
      "total": 1,
      "max_score": 1.5155444,
      "hits": [
         {
            "_index": "nestedtest",
            "_type": "test",
            "_id": "AUus7jbcS8gWlP4VLwGZ",
            "_score": 1.5155444,
            "_source": {
               "event": [
                  {
                     "name": "e1",
                     "count": 1
                  },
                  {
                     "name": "e3",
                     "count": 3
                  }
               ]
            }
         }
      ]
   }
}

Upvotes: 2

Related Questions