Elasticsearch sub-aggregation with a condition

Question

I have the database table columns like:

ID | Biz Name | License # | Violations | ...

I need to find out those businesses that have more than 5 violations.

I have the following:

{
   "query": {
       "bool": {
         "must": {
            "match": {
              "violations": {
                "query": "MICE DROPPINGS were OBSERVED",
                "operator": "and"
              }
            }
          },
          "must_not": {
            "match": {
              "violations": {
                "query": "NO MICE DROPPINGS were OBSERVED",
                "operator": "and"
              }
            }
          }
        }
      }
    },

    "aggs" : {
          "selected_bizs" :{
                 "terms" : {
                      "field" : "Biz Name.keyword",
                                "min_doc_count": 5,
                                "size" :1000
                           },
                      "aggs": {
                          "top_biz_hits": {
                          "top_hits": {
                              "size": 10
                              }
                          }
                      }
                 }
            }
       }

It seems working.

Now I need to find out those businesses that have 5 or more violations(like above), and also have 3 or more license #s.

I am not sure how to further aggregate this.

Thanks!

Joe - Check out my books · Accepted Answer

Let's assume that your License # field is defined just like the Biz Name and has a .keyword mapping.

Now, the statement:

find the businesses that have ... 3 or more license #s

can be rephrased as:

aggregate by the business name under the condition that the number of distinct values of the bucketed license IDs is greater or equal to 3.

With that being said, you can use the cardinality aggregation to get distinct License IDs.

Secondly, the mechanism for "aggregating under a condition" is the handy bucket_script aggregation which executes a script to determine whether the currently iterated bucket will be retained in the final aggregation.

Leveraging both of these in tandem would mean:

POST your-index/_search
{
  "size": 0, 
  "query": {
    "bool": {
      "must": {
        "match": {
          "violations": {
            "query": "MICE DROPPINGS were OBSERVED",
            "operator": "and"
          }
        }
      },
      "must_not": {
        "match": {
          "violations": {
            "query": "NO MICE DROPPINGS were OBSERVED",
            "operator": "and"
          }
        }
      }
    }
  },
  "aggs": {
    "selected_bizs": {
      "terms": {
        "field": "Biz Name.keyword",
        "min_doc_count": 5,
        "size": 1000
      },
      "aggs": {
        "top_biz_hits": {
          "top_hits": {
            "size": 10
          }
        },
        "unique_license_ids": {
          "cardinality": {
            "field": "License #.keyword"
          }
        },
        "must_have_min_3_License #s": {
          "bucket_selector": {
            "buckets_path": {
              "unique_license_ids": "unique_license_ids" 
            },
            "script": "params.unique_license_ids >= 3"
          }
        }
      }
    }
  }
}

and that's all there's to it!

Elasticsearch sub-aggregation with a condition

Answers (1)

Related Questions