fuzeto
fuzeto

Reputation: 247

How to filter by aggregation bucket?

I need a query that returns only result that has 1 bucket.

The query below returns me the access data of a visitor grouped by day.

{
    "size" :  0,
    "query" : {
        "filtered" : {
            "filter" : {
                "bool" : {
                    "must" : [
                        {
                            "range" : {
                                "start_time" : {
                                    "gte" : "2019-02-06 00:00:00",
                                    "lte" : "2019-02-11 23:59:59"
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "aggs" : {
        "UNIQUE" : {
            "terms" : {
                "size" : 0,
                "field" : "username"
            },
            "aggs" : {
                "visits" : {
                    "date_histogram" : {
                        "field" : "start_time",
                        "interval" : "day",
                        "format" : "yyyy-MM-dd"
                    }
                }
            }
        }
    }
}

I need to know which ones returned only once in the period. So when you have only 1 bucket, it's ONE. And if it has visited for more than a day (buckets> 1) then it is RECURRENT.

Upvotes: 0

Views: 157

Answers (2)

fuzeto
fuzeto

Reputation: 247

Solution found was:

{
    "size" :  0,
    "query" : {
        {
            "range" : {
                "start_time" : {
                    "gte" : "2019-02-11 00:00:00",
                    "lte" : "2019-02-11 23:59:59"
                }
            }
        }
    },
    "aggs" : {
        "UNIQUE" : {
            "terms" : {
                "size" : 0,
                "field" : "username"
            },
            "aggs":{
                "visit_date": {
                    "date_histogram": {
                        "field" : "start_time",
                        "interval" : "day",
                        "format" : "yyyy-MM-dd"
                    }
                },
                "count": {
                    "cardinality": {
                        "script": "new Date(doc['start_time'].value).format('yyyy-MM-dd')"
                    }
                },
                "equal_one":{  
                    "bucket_selector":{  
                        "buckets_path":{  
                            "count":"count.value"
                        },
                        "script":"count == 1"
                    }
                }
            }
        }
    }
}

But performance remains a problem. In an environment with about 1 million records this query does not work very well.

Maybe some query using Scripted Metrics would solve, but demand more analysis (doc: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-scripted-metric-aggregation.html)

Upvotes: 0

Kamal Kunjapur
Kamal Kunjapur

Reputation: 8860

If I understand it correctly, you'd want a list of users who have had a unique date or like visited only once in a particular time frame and you'd want both the details, date and the username to be in the aggregation.

I've created a sample mapping, sample documents, aggregation query and how it would appear in the response

Mapping:

PUT mytest
{
  "mappings": {
    "mydocs": {
      "properties": {
        "username": {
          "type": "keyword"
        },
        "start_time": {
          "type": "date",
          "format": "yyyy-MM-dd"
        }
      }
    }
  }
}

Sample Documents:

You can see that I've created 6 documents where John has visited twice on same date, Jack visits site on two different dates, while Jane and Rob visited only once in the time-frame for which I will write an aggregation.

POST mytest/mydocs/1
{
  "username": "john",
  "start_time": "2018-08-01"
}

POST mytest/mydocs/2
{
  "username": "john",
  "start_time": "2018-08-01"
}

POST mytest/mydocs/3
{
  "username": "jane",
  "start_time": "2018-08-01"
}

POST mytest/mydocs/4
{
  "username": "rob",
  "start_time": "2018-08-01"
}

POST mytest/mydocs/5
{
  "username": "jack",
  "start_time": "2018-08-01"
}

POST mytest/mydocs/6
{
  "username": "jack",
  "start_time": "2018-08-02"
}

Updated Aggregation Request

Note I've added two more documents with username Jack who visits the site on two different dates, username John visits the site twice on the same date.

POST mytest/_search
{
  "size": 0,
  "query": {
    "range": {
      "start_time": {
        "gte": "2017-08-01",
        "lte": "2019-08-01"
      }
    }
  },
  "aggs": {
    "myterms": {
      "terms": {
        "size": 100,
        "field": "username"
      },
      "aggs": {
        "visit_date": {
          "date_histogram": {
            "field": "start_time",
            "interval" : "day",
            "format" : "yyyy-MM-dd"
          }
        },
        "count": {
          "cardinality": {
            "field": "start_time"
          }
        },
        "equal_one":{  
          "bucket_selector":{  
            "buckets_path":{  
             "count":"count.value"
            },
            "script":"params.count == 1"
          }
        }
      }
    }
  }
}

Response

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "myterms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "john",
          "doc_count": 2,
          "count": {
            "value": 1
          },
          "visit_date": {
            "buckets": [
              {
                "key_as_string": "2018-08-01",
                "key": 1533081600000,
                "doc_count": 2
              }
            ]
          }
        },
        {
          "key": "jane",
          "doc_count": 1,
          "count": {
            "value": 1
          },
          "visit_date": {
            "buckets": [
              {
                "key_as_string": "2018-08-01",
                "key": 1533081600000,
                "doc_count": 1
              }
            ]
          }
        },
        {
          "key": "rob",
          "doc_count": 1,
          "count": {
            "value": 1
          },
          "visit_date": {
            "buckets": [
              {
                "key_as_string": "2018-08-01",
                "key": 1533081600000,
                "doc_count": 1
              }
            ]
          }
        }
      ]
    }
  }
}

You can see that John now appears in the result as expected even if he has visited site multiple times on same date.

Let me know if you have any queries.

Upvotes: 1

Related Questions