andreyro
andreyro

Reputation: 985

Elasticsearch - Applying multi level filter on nested aggregation bucket?

I'm, trying to get distinct nested objects by applying multiple filters.

Basically in Elasticsearch I have cities as top level document and inside I have nested citizens documents, which have another nested pets documents.

I am trying to get all citizens that have certain conditions applied on all of these 3 levels (cities, citizens and pets):

Give me all distinct citizens 
that have age:"40", 
that have pets "name":"Casper",
from cities with office_type="secondary" 

I know that to filter 1st level I can use query condition, and then if I need to filter the nested citizens I can add a filter in the aggregation level.

I am using this article as an example: https://iridakos.com/tutorials/2018/10/22/elasticsearch-bucket-aggregations.html

Query working so far:

GET city_offices/_search
{
  "size" : 10,
   "query": {
    "term" : { "office_type" : "secondary" } 
  },
  "aggs": {
      "citizens": {
        "nested": {
          "path": "citizens"
        },
        "aggs": {
          "inner_agg": {
            "filter": {
                "term": { "citizens.age": "40" }  
              } ,
              "aggs": {
                  "occupations": {
                    "terms": {
                      "field": "citizens.occupation"
                    }
                  }
              }
          }
        }
      }
    }
}

BUT: How can I add the "pets" nested filter condition?

Mapping:

PUT city_offices
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "doc": {
      "properties": {
        "city": {
          "type": "keyword"
        },
        "office_type": {
          "type": "keyword"
        },
        "citizens": {
          "type": "nested",
          "properties": {
            "occupation": {
              "type": "keyword"
            },
            "age": {
              "type": "integer"
            },
            "pets": {
              "type": "nested",
              "properties": {
                "kind": {
                  "type": "keyword"
                  },
                "name": {
                  "type": "keyword"
                },
                "age": {
                  "type": "integer"
                }
              }
            }
          }
        }
      }
    }
  }
}

Index data:

PUT /city_offices/doc/1
{
   "city":"Athens",
   "office_type":"secondary",
   "citizens":[      
      {
         "occupation":"Statistician",
         "age":30,
         "pets":[
            {
               "kind":"Cat",
               "name":"Phoebe",
               "age":14
            }
         ]
      },
      {
         "occupation":"Librarian",
         "age":30,
         "pets":[
            {
               "kind":"Rabbit",
               "name":"Nino",
               "age":13
            }
         ]
      },   
      {
         "occupation":"Librarian",
         "age":40,
         "pets":[
            {
               "kind":"Rabbit",
               "name":"Nino",
               "age":13
            }
         ]
      },      
      {
         "occupation":"Statistician",
         "age":40,
         "pets":[
            {
               "kind":"Rabbit",
               "name":"Casper",
               "age":2
            },
            {
               "kind":"Rabbit",
               "name":"Nino",
               "age":13
            },
            {
               "kind":"Dog",
               "name":"Nino",
               "age":15
            }
         ]
      }   
   ]
}

Upvotes: 0

Views: 958

Answers (1)

andreyro
andreyro

Reputation: 985

So I found a solution for this. Basically I apply top level filters in the query section and then apply rest of conditions in the aggregations.

First I apply citizens level filter aggregation, then I go inside nested pets and apply the filter and then I need to get back up to citizens level (using reverse_nested: citizens) and then set the term that will generate the final bucket.

Query looks like this:

GET city_offices/_search
{
  "size" : 10,
   "query": {
    "term" : { "office_type" : "secondary" } 
  },
  "aggs": {
      "citizens": {
        "nested": {
          "path": "citizens"
        },
        "aggs": {
          "inner": {
            "filter": {
                "term": { "citizens.age": "40" }  
              } ,
              "aggs": {
                  "occupations": {
                    "nested": {
                      "path": "citizens.pets"
                    },
                    "aggs": {
                      "inner_pets": {
                        "filter": {
                            "term": { "citizens.pets.name": "Casper" }  
                          } ,
                           "aggs": {
                             "lll": {
                               "reverse_nested": {
                                  "path": "citizens"
                                },
                                "aggs": {
                                   "xxx": {
                                      "terms": {
                                        "field": "citizens.occupation",
                                        "size": 10
                                      }
                                  }
                                }
                              }
                           }
                      }
                    }
                  }
              }
          }
        }
      }
    }
}

The response bucket looks like this:

              "xxx": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "Librarian",
                    "doc_count": 1
                  },
                  {
                    "key": "Statistician",
                    "doc_count": 1
                  }
                ]
              }

Any other suggestions?

Upvotes: 0

Related Questions