Alex
Alex

Reputation: 1947

Convert timestamps to datetime for use in Elasticsearch aggregations

I have an index of SendGrid event data:

"_source": {
    "externalId": "9283cc1d-b003-xxxx-a5af-84fcf31c4181",
    "email": "[email protected]",
    "timestamp": 1616515214,
    "event": "processed",
    "uid": null,
    "id": null,
    "sendgridEventId": null,
    "smtpId": null,
    "sgMessageId": null,
    "sgEventId": null,
    "sendgridEvent": null,
    "type": null,
    "category": [],
    "reason": null,
    "status": null,
    "url": null,
    "useragent": null,
    "ip": null,
    "response": null,
    "tls": null,
    "attempt": null,
    "sendAt": null,
    "asmGroupId": null
}

Now I like to aggregate all of these events for a given day using the timestamp attribute.

GET /sendgridevententity/_search
{
  "query":
   {
    "match_all": {}
   },
   "aggs": {
     "amount_per_day": {
       "date_histogram": {
         "field": "timestamp",
         "calendar_interval": "1d"
       }
     }
   }
}  

Unfortunately, this just yields all the single events as they all have a different timestamp and the aggregation does not group them by day.

How can I convert the timestamps to date and then run the aggregation?

Upvotes: 5

Views: 5308

Answers (1)

Joe - Check out my books
Joe - Check out my books

Reputation: 16895

You can take advantage of a multi field mapping. Here's how it works.

  1. Update the existing mapping with a new date "sub-field". I'm assuming timestamp was originally mapped as a long. I'm also assuming the timestamps are in epoch seconds, thereby the explicitly set format:
POST sendgridevententity/_mapping
{
  "properties": {
    "timestamp": {
      "type": "long",
      "fields": {
        "as_date": {
          "type": "date",
          "format": "epoch_second"
        }
      }
    }
  }
}
  1. This new property now needs to be picked up and your data needs to be reindexed. You can trigger a reindex call via a cool little trick — sending an empty _update_by_query request:
POST sendgridevententity/_update_by_query
  1. After the reindex operation finishes, you can target the new date field through the dot notation:
GET /sendgridevententity/_search
{
  "size": 0, 
  "query": {
    "match_all": {}
  },
  "aggs": {
    "amount_per_day": {
      "date_histogram": {
        "field": "timestamp.as_date",
        "format": "yyyy-MM-dd", 
        "calendar_interval": "1d"
      }
    }
  }
}

⚠️ Depending your index size and many other factors, the _update_by_query request may appear to time out. It's possible to set wait_for_completion=false which'll trigger an asynchronous background task instead.

💡 Note that I used size: 0 in the final request. It's a handy tool for returning only the aggregation results.

Upvotes: 6

Related Questions