Reputation: 1
Expectation: I need all users which are not successful in their last attempt.
Actual/My Approach: I applied aggregation by userId and top_hits with size of 1 document which is sorted in descending order of the time.
I have prepared the query like this. Through this I am able to get all users and their last status. After that I want to filter based on status. Can any one help here in fixing this. I have applied post_filter after aggregation, but still its not filtering. In case of any other approach, please help here.
Input:
[
{
"userId": "u1",
"status": "Failure",
"time": 1719543600008 // This is most updated record for user - u1
},
{
"userId": "u1",
"status": "Success",
"time": 1719543600007
},
{
"userId": "u1",
"status": "Timeout",
"time": 1719543600006
},
{
"userId": "u2",
"status": "Timeout",
"time": 1719543600004 // This is most updated record for user - u2
},
{
"userId": "u2",
"status": "Failure",
"time": 1719543600003
},
{
"userId": "u3",
"status": "Success",
"time": 1719543600002 // This is most updated record for user - u3. As its success, it needs to be discarded from output
},
{
"userId": "u3",
"status": "Failure",
"time": 1719543600001
}
]
Expected Output:
[
{
"userId": "u1",
"status": "Failure",
"time": 1719543600008
},
{
"userId": "u2",
"status": "Timeout",
"time": 1719543600004
}
]
Query:
{
"query": {
"bool": {
"filter": [
{
"range": {
"data.time": {
"gte": "1719543600000",
"lte": "1719584179015",
"format": "epoch_millis"
}
}
},
{
"query_string": {
"query": "data.type:\"user-stats\""
}
}
]
}
},
"aggs": {
"group_by_userId": {
"terms": {
"field": "data.userId.keyword"
},
"aggs": {
"users_last_status": {
"top_hits": {
"size": 1,
"sort": [
{
"data.time": {
"order": "desc"
}
}
]
}
}
}
}
},
"post_filter": { // In this query this filter is not working
"term": {
"data.status.keyword": "failure"
}
}
}
Actual Output:
[
{
"userId": "u1",
"status": "Failure",
"time": 1719543600008
},
{
"userId": "u2",
"status": "Timeout",
"time": 1719543600004
},
{
"userId": "u3", // This shouldn't come in output as we are concerned about only failure records.
"status": "Success",
"time": 1719543600002
}
]
Note: As there is no limit on number of users, we don't want to filter on application/client side to reduce load.
Upvotes: 0
Views: 120
Reputation: 3680
post_filter
only affects the query results and not the aggregations
results.
Use the search API’s
post_filter
parameter. Search requests apply post filters only to search hits, not aggregations. You can use a post filter to calculate aggregations based on a broader result set, and then further narrow the results. https://www.elastic.co/guide/en/elasticsearch/reference/current/filter-search-results.html
You can use terms query in the query.bool.filter
like the following.
{
"query":{
"bool":{
"filter":[
{"range":{"data.time":{"gte":"1719543600000","lte":"1719584179015","format":"epoch_millis"}}},
{"query_string":{"query":"data.type:\"user-stats\""}},
{"terms":{"status":["Timeout","Failure"]}}
]
}
},
"aggs": {...}
}
Upvotes: 0