Reputation: 6876
I have a fleet of devices that push to ElasticSearch at regular intervals (let's say every 10 minutes) entries of this form:
{
"deviceId": "unique-device-id",
"timestamp": 1586390031,
"payload" : { various data }
}
I usually look at this through Kibana by filtering for the last 7 days of data and then drilling down by device id or some other piece of data from the payload.
Now I'm trying to get a sense of the health of this fleet by finding devices that haven't reported anything in the last hour let's say. I've been messing around with all sorts of filters and visualisations and the closest I got to this is a data table with device ids and the timestamp of the last entry for each, sorted by timestamp. This is useful but is somewhat hard to work with as I have a few thousand devices.
What I dream of is getting either the above mentioned table to contain only the device ids that have not reported in the last hour, or getting only two numbers: the total count of distinct device ids seen in the last 7 days and the total count of device ids not seen in the last hour.
Can you point me in the right direction, if any one of these is even possible?
Upvotes: 1
Views: 520
Reputation: 16943
I'll skip the table and take the second approach -- only getting the counts. I think it's possible to walk your way backwards to the rows from the counts.
Note: I'll be using a human readable time format instead of timestamps but epoch_seconds
will work just as fine in your real use case. Also, I've added the comment
field to give each doc some background.
First, set up a your index:
PUT fleet
{
"mappings": {
"properties": {
"timestamp": {
"type": "date",
"format": "epoch_second||yyyy-MM-dd HH:mm:ss"
},
"comment": {
"type": "text"
},
"deviceId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Sync a few docs -- I'm in UTC+2 so I chose these timestamps:
POST fleet/_doc
{
"deviceId": "asdjhfa343",
"timestamp": "2020-04-05 10:00:00",
"comment": "in the last week"
}
POST fleet/_doc
{
"deviceId": "asdjhfa343",
"timestamp": "2020-04-10 13:05:00",
"comment": "#asdjhfa343 in the last hour"
}
POST fleet/_doc
{
"deviceId": "asdjhfa343",
"timestamp": "2020-04-10 12:05:00",
"comment": "#asdjhfa343 in the 2 hours"
}
POST fleet/_doc
{
"deviceId": "asdjhfa343sdas",
"timestamp": "2020-04-07 09:00:00",
"comment": "in the last week"
}
POST fleet/_doc
{
"deviceId": "asdjhfa343sdas",
"timestamp": "2020-04-10 12:35:00",
"comment": "in last 2hrs"
}
In total, we've got 5 docs and 2 distinct device ids w/ the following conditions
so I'm interested in finding precisely 1 deviceId
which has appeared in the last 2hrs BUT not last 1hr.
Using a combination of filter (for range filters), cardinality (for distinct counts) and bucket script (for count differences) aggregations.
GET fleet/_search
{
"size": 0,
"aggs": {
"distinct_devices_last7d": {
"filter": {
"range": {
"timestamp": {
"gte": "now-7d"
}
}
},
"aggs": {
"uniq_device_count": {
"cardinality": {
"field": "deviceId.keyword"
}
}
}
},
"not_seen_last1h": {
"filter": {
"range": {
"timestamp": {
"gte": "now-2h"
}
}
},
"aggs": {
"device_ids_per_hour": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day",
"format": "'disregard' -- yyyy-MM-dd"
},
"aggs": {
"total_uniq_count": {
"cardinality": {
"field": "deviceId.keyword"
}
},
"in_last_hour": {
"filter": {
"range": {
"timestamp": {
"gte": "now-1h"
}
}
},
"aggs": {
"uniq_count": {
"cardinality": {
"field": "deviceId.keyword"
}
}
}
},
"uniq_difference": {
"bucket_script": {
"buckets_path": {
"in_last_1h": "in_last_hour>uniq_count",
"in_last2h": "total_uniq_count"
},
"script": "params.in_last2h - params.in_last_1h"
}
}
}
}
}
}
}
}
The date_histogram
aggregation is just a placeholder that enables us to use a bucket script
to get the final difference and not have to do any post-processing.
Since we passed size: 0
, we're not interested in the hits
section. So taking only the aggregations, here are the annotated results:
...
"aggregations" : {
"not_seen_last1h" : {
"doc_count" : 3,
"device_ids_per_hour" : {
"buckets" : [
{
"key_as_string" : "disregard -- 2020-04-10",
"key" : 1586476800000,
"doc_count" : 3, <-- 3 device messages in the last 2hrs
"total_uniq_count" : {
"value" : 2 <-- 2 distinct IDs
},
"in_last_hour" : {
"doc_count" : 1,
"uniq_count" : {
"value" : 1 <-- 1 distict ID in the last hour
}
},
"uniq_difference" : {
"value" : 1.0 <-- 1 == final result !
}
}
]
}
},
"distinct_devices_last7d" : {
"meta" : { },
"doc_count" : 5, <-- 5 device messages in the last 7d
"uniq_device_count" : {
"value" : 2 <-- 2 unique IDs
}
}
}
Upvotes: 1