Reputation: 1284
I have a very common problem, in that I need to show which users or document-categories, all given by a keyword
column, that are not present in a given time-interval. I default to use the Terms aggregation, which obviously does not return anything for the missing entries.
This is very simple problem in a relational database, just do an outer join from the user table. In Kibana/ElasticSearch I cannot figure out how to solve this.
On way that works is to switch to Filter and then copy and paste all users into individual filter specifications. That, however, can't be maintained, and does not scale with multiple reports.
I am fine with having to have a single example document for each Term, even if it's just a dummy. This would show all items when selecting the item in Kibana auto-complete, etc. If I could then get the results to always include at least one bucket from each of these terms - the problem would have been solved.
Example, the Kibana Y axis is a simple count, while the x axis should show the users with the least entries. The report is set to show data for Period 2:
User | Period 1 | Period 2 |
MR_X | o o o o o o | o o o o |
MISS_Y | o o o | o |
MR_Z | o o o | |
MISS_W | | |
In this example, the report for Period 2 should at least show MISS_Y, and MR_Z as these are known in the dataset and have the fewest entries in Period 2. Some way to include MISS_W, which does not have any entries in the dataset would be a bonus.
Upvotes: 1
Views: 897
Reputation: 2708
Apologies in advance if I've misunderstood your question. Aggregations provide a way to get different distributions of the documents in your result set. If you want different aggregations for different time intervals, you'll need your query to return results for all your time intervals, and you'll need to filter on different intervals within each of your aggregations.
For example, if you have the following:
timestamp
that you are using to specify your time intervaluser
that you want to aggregate overThen you could try structuring your Elasticsearch query as follows
GET myindex/_search
{
...
"aggs": {
"period-2-distribution": {
"filter": {
"range": {
"timestamp": {
"gte": "now-1h"
}
}
},
"aggs": {
"user-agg": {
"terms": {
"field": "user",
"size": 1000
}
}
}
},
"period-1-distribution": {
"filter": {
"range": {
"timestamp": {
"lt": "now-1h"
}
}
},
"aggs": {
"user-agg": {
"terms": {
"field": "user",
"size": 1000
}
}
}
}
}
}
To reiterate, if you currently have a query
before your aggs
block, then you'll need to remove any clause from within query
that specifies a time interval. This is admittedly a very invasive change to your query, and I appreciate it might break another one of your requirements. In this case you will need to take a different approach, but Elasticsearch is fairly flexible and should hopefully provide you a way to get what you want.
Upvotes: 1