Patrick Szalapski
Patrick Szalapski

Reputation: 9439

In ElasticSearch, how do I filter the nested documents in my result?

Suppose, in ElasticSearch 5, I have data with nesting like:

{"number":1234, "names": [ 
  {"firstName": "John", "lastName": "Smith"}, 
  {"firstName": "Al", "lastName": "Jones"}
]},  
...

And I want to query for hits with number 1234 but return only the names that match "lastName": "Jones", so that my result omits names that don't match. In other words, I want to get back only part of the matching document, based on a term query or similar.

A simple nested query won't do, as such would be filtering top-level results. Any ideas?

{ "query" : { "bool": { "filter":[
    { "term": { "number":1234} },
    ????  something with "lastName": "Jones" ????
] } } }

I want back:

hits: [
   {"number":1234, "names": [ 
     {"firstName": "Al", "lastName": "Jones"}
   ]},  
   ...
]

Upvotes: 22

Views: 23142

Answers (3)

Taras Kohut
Taras Kohut

Reputation: 2555

hits section returns a _source - this is exactly the same document you have indexed.

You are right, nested query filters top-level results, but with inner_hits it will show you which inner nested objects caused these top-level documents to be returned, and this is exactly what you need.

names field can be excluded from top-level hits using _source parameter.

{
   "_source": {
      "excludes": ["names"]
   },
   "query":{
      "bool":{
         "must":[
            {
               "term":{
                  "number":{
                     "value":"1234"
                  }
               }
            },
            {
               "nested":{
                  "path":"names",
                  "query":{
                     "term":{
                        "names.lastName":"Jones"
                     }
                  },
                  "inner_hits":{
                  }
               }
            }
         ]
      }
   }
}

So now top-level documents are returned without names field, and you have an additional inner_hits section with the names that match.
You should treat nested objects as part of a top-level document. If you really need them to be separate - consider parent/child relations.

Upvotes: 31

Mohammad Akbari
Mohammad Akbari

Reputation: 4766

Try something like this

{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "bool": {
               "must": [
                  {
                     { "term": { "number":1234} }
                  },
                  {
                     "nested": {
                        "path": "something",
                        "query": {
                           "term": {
                              "something.lastName": "Jones"
                           }
                        },
                        "inner_hits" : {}
                     }
                  }
               ]
            }
         }
      }
   }
}

I used this Refrence

Upvotes: 4

Phillip Bauman
Phillip Bauman

Reputation: 51

Similar but a bit different, use the should parameter and then look at inner hits for the names. This will return the top level doc and then inner_hits will have any hits.

   { 
      "_source": {
        "excludes": ["names"]
      },
       "query":{
          "bool":{
             "must":[
                {
                   "term":{
                      "number":{
                         "value":"1234"
                      }
                   }
                }
             ],
             should: [
             {
                "nested":{
                   "path":"names",
                   "query":{
                      "term":{
                         "names.lastName":"Jones"
                      }
                   },
                   "inner_hits":{
                   }
                }
             }

             ]
          }
       }
    }

Upvotes: 5

Related Questions