Philip Shangguan
Philip Shangguan

Reputation: 537

Elasticsearch sub-aggregation with a condition

I have the database table columns like:

ID | Biz Name | License # | Violations | ...

I need to find out those businesses that have more than 5 violations.

I have the following:

{
   "query": {
       "bool": {
         "must": {
            "match": {
              "violations": {
                "query": "MICE DROPPINGS were OBSERVED",
                "operator": "and"
              }
            }
          },
          "must_not": {
            "match": {
              "violations": {
                "query": "NO MICE DROPPINGS were OBSERVED",
                "operator": "and"
              }
            }
          }
        }
      }
    },

    "aggs" : {
          "selected_bizs" :{
                 "terms" : {
                      "field" : "Biz Name.keyword",
                                "min_doc_count": 5,
                                "size" :1000
                           },
                      "aggs": {
                          "top_biz_hits": {
                          "top_hits": {
                              "size": 10
                              }
                          }
                      }
                 }
            }
       }
   

It seems working.

Now I need to find out those businesses that have 5 or more violations(like above), and also have 3 or more license #s.

I am not sure how to further aggregate this.

Thanks!

Upvotes: 2

Views: 1468

Answers (1)

Joe - Check out my books
Joe - Check out my books

Reputation: 16943

Let's assume that your License # field is defined just like the Biz Name and has a .keyword mapping.


Now, the statement:

find the businesses that have ... 3 or more license #s

can be rephrased as:

aggregate by the business name under the condition that the number of distinct values of the bucketed license IDs is greater or equal to 3.

With that being said, you can use the cardinality aggregation to get distinct License IDs.

Secondly, the mechanism for "aggregating under a condition" is the handy bucket_script aggregation which executes a script to determine whether the currently iterated bucket will be retained in the final aggregation.

Leveraging both of these in tandem would mean:

POST your-index/_search
{
  "size": 0, 
  "query": {
    "bool": {
      "must": {
        "match": {
          "violations": {
            "query": "MICE DROPPINGS were OBSERVED",
            "operator": "and"
          }
        }
      },
      "must_not": {
        "match": {
          "violations": {
            "query": "NO MICE DROPPINGS were OBSERVED",
            "operator": "and"
          }
        }
      }
    }
  },
  "aggs": {
    "selected_bizs": {
      "terms": {
        "field": "Biz Name.keyword",
        "min_doc_count": 5,
        "size": 1000
      },
      "aggs": {
        "top_biz_hits": {
          "top_hits": {
            "size": 10
          }
        },
        "unique_license_ids": {
          "cardinality": {
            "field": "License #.keyword"
          }
        },
        "must_have_min_3_License #s": {
          "bucket_selector": {
            "buckets_path": {
              "unique_license_ids": "unique_license_ids" 
            },
            "script": "params.unique_license_ids >= 3"
          }
        }
      }
    }
  }
}

and that's all there's to it!

Upvotes: 2

Related Questions