Eran Moshe
Eran Moshe

Reputation: 3208

elasticsearch groupby and filter by regex condition

It's a bit hard for me to define the question as I'm not very experienced with Elasticsearch. I'm focusing the question on my specific problem:

Assuming I have the following records:

{
    id: 1
    name: bla1_1.aaa
},
{
    id: 1
    name: bla1_2.bbb
},
{
    id: 2
    name: bla2_1.aaa
},
{
    id: 2
    name: bla2_2.aaa
}

What I want is to GET all the ids that have all of their names ending with aaa.

I was thinking about group by id and then do a regex query like so: *\.aaa so that all the name must satisfy the regex query.

On this particular example I would get id: 2 back.

How do I do it?

Let me know if there's anything I need to add to clarify the question.

Upvotes: 1

Views: 1533

Answers (1)

jaspreet chahal
jaspreet chahal

Reputation: 9099

RegexExp can be used.

Wildcard .* matches any character any number of times including zero

Terms aggregation will give you unique "ids" and number of docs under them.

Mapping :

PUT regex
{
  "mappings": {
    "properties": {
      "id":{
        "type":"integer"
      },
      "name":{
        "type":"text",
        "fields": {
          "keyword":{
            "type":"keyword"
          }
        }
      }
    }
  }
}

Data:

"hits" : [
      {
        "_index" : "regex",
        "_type" : "_doc",
        "_id" : "olQXjW0BywGFQhV7k84P",
        "_score" : 1.0,
        "_source" : {
          "id" : 1,
          "name" : "bla1_1.aaa"
        }
      },
      {
        "_index" : "regex",
        "_type" : "_doc",
        "_id" : "o1QXjW0BywGFQhV7us6B",
        "_score" : 1.0,
        "_source" : {
          "id" : 1,
          "name" : "bla1_2.bbb"
        }
      },
      {
        "_index" : "regex",
        "_type" : "_doc",
        "_id" : "pFQXjW0BywGFQhV77c6J",
        "_score" : 1.0,
        "_source" : {
          "id" : 2,
          "name" : "bla2_1.aaa"
        }
      },
      {
        "_index" : "regex",
        "_type" : "_doc",
        "_id" : "pVQYjW0BywGFQhV7Dc6F",
        "_score" : 1.0,
        "_source" : {
          "id" : 2,
          "name" : "bla2_2.aaa"
        }
      }
    ]

Query:

GET regex/_search
{
  "size":0,
  "query": {
        "regexp": {
            "name.keyword": {
                "value": ".*.aaa"   ---> name ending with .aaa
            }
        }
  },
  "aggs": {
    "unique_ids": {
      "terms": {
        "field": "id",
        "size": 10
      }
    }
  }
}

Result:

"hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "unique_ids" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 2,     ---> 2 doc under id 2
          "doc_count" : 2
        },
        {
          "key" : 1,     ----> 1 doc under id 1
          "doc_count" : 1
        }
      ]
    }
  }

Edit:

Using bucket selector to keep buckets where total count of docs in Id matches with docs selected in regex

GET regex/_search
{
  "size": 0,
  "aggs": {
    "unique_ids": {
      "terms": {
        "field": "id",
        "size": 10
      },
      "aggs": {
        "totalCount": {   ---> to get total count of id(all docs)
          "value_count": {
            "field": "id"
          }
        },
        "filter_agg": {
          "filter": {
            "bool": {
              "must": [
                {
                  "regexp": {
                    "name.keyword": ".*.aaa"
                  }
                }
              ]
            }
          },
          "aggs": {
            "finalCount": { -->total count of docs matching regex
              "value_count": {
                "field": "id"
              }
            }
          }
        },
        "mybucket_selector": { ---> include buckets where totalcount==finalcount
          "bucket_selector": {
            "buckets_path": {
              "FinalCount": "filter_agg>finalCount",
              "TotalCount": "totalCount"
            },
            "script": "params.FinalCount==params.TotalCount"
          }
        }
      }
    }
  }
}

Upvotes: 2

Related Questions