Shioshin
Shioshin

Reputation: 67

Aggregate document value per hour

I have a question about aggregation. I read about Date Histogram Aggregation. But it only sorts documents by date. So I have index visits with field date and visited_page. And I want to aggregate for example counts per hour(e.g. user visiting page per hour). Will aggregation above should be used or I should somehow aggregate in different way?

Upvotes: 0

Views: 1557

Answers (2)

deerawan
deerawan

Reputation: 8443

The query is supposed to be like this below:

GET {index_name}/{type}/_search
{
  "size": 0, // no need to display search result, can boost query speed
  "aggs": {
    "unique_visited_page": {
      "terms": {
        "field": "visited_page" // this must be indexed with keyword type
      },
      "aggs": {
        "visit_page_per_hour" : {
          "date_histogram" : {
              "field" : "date_field",
              "interval" : "hour"
          }
        }
      }
    }
  }
}

We aggregate by visited_page first then per each visited_page, we drill down it per hour to get the count.

Example response using my sample data

{
  ...
  "hits": {
    "total": 4,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "unique_visited_page": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "contact.html",
          "doc_count": 2,
          "visit_page_per_hour": {
            "buckets": [
              {
                "key_as_string": "2018-07-24T14:00:00.000Z",
                "key": 1532440800000,
                "doc_count": 1
              },
              {
                "key_as_string": "2018-07-24T15:00:00.000Z",
                "key": 1532444400000,
                "doc_count": 1
              }
            ]
          }
        },
        {
          "key": "index.html",
          "doc_count": 1,
          "visit_page_per_hour": {
            "buckets": [
              {
                "key_as_string": "2018-07-24T13:00:00.000Z",
                "key": 1532437200000,
                "doc_count": 1
              }
            ]
          }
        },
        {
          "key": "page.html",
          "doc_count": 1,
          "visit_page_per_hour": {
            "buckets": [
              {
                "key_as_string": "2018-07-24T13:00:00.000Z",
                "key": 1532437200000,
                "doc_count": 1
              }
            ]
          }
        }
      ]
    }
  }
}

The key of the result is our visited_page value then it will be aggregated per hour and return the doc_count. The doc_count perhaps the value that you want.

Hope it helps.

Upvotes: 1

Bartors
Bartors

Reputation: 190

It looks like you need multi-bucket aggregation. I found this

What you are interested in is this:

 GET /_search
{
    "aggs" : {
        "my_buckets": {
            "composite" : {
                "sources" : [
                    { "date": { "date_histogram": { "field": "timestamp", "interval": "1d" } } },
                    { "product": { "terms": {"field": "product" } } }
                ]
            }
        }
    }
}

This will create composite buckets from the values created by two values source, a date_histogram and a terms. Each bucket is composed of two values, one for each value source defined in the aggregation. Any type of combinations is allowed and the order in the array is preserved in the composite buckets.

Does it help?

Upvotes: 0

Related Questions