Techi
Techi

Reputation: 1

post_filter after applying top_hits aggregation not working in elastic search

Expectation: I need all users which are not successful in their last attempt.

Actual/My Approach: I applied aggregation by userId and top_hits with size of 1 document which is sorted in descending order of the time.

I have prepared the query like this. Through this I am able to get all users and their last status. After that I want to filter based on status. Can any one help here in fixing this. I have applied post_filter after aggregation, but still its not filtering. In case of any other approach, please help here.

Input:

[
  {
    "userId": "u1",
    "status": "Failure",
    "time": 1719543600008 // This is most updated record for user - u1
  },
  {
    "userId": "u1",
    "status": "Success",
    "time": 1719543600007
  },
  {
    "userId": "u1",
    "status": "Timeout",
    "time": 1719543600006
  },
  {
    "userId": "u2",
    "status": "Timeout",
    "time": 1719543600004 // This is most updated record for user - u2
  },
  {
    "userId": "u2",
    "status": "Failure",
    "time": 1719543600003
  },
  {
    "userId": "u3",
    "status": "Success",
    "time": 1719543600002 // This is most updated record for user - u3. As its success, it needs to be discarded from output
  },
  {
    "userId": "u3",
    "status": "Failure",
    "time": 1719543600001
  }
]

Expected Output:

[
  {
    "userId": "u1",
    "status": "Failure",
    "time": 1719543600008
  },
  {
    "userId": "u2",
    "status": "Timeout",
    "time": 1719543600004
  }
]

Query:

{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "data.time": {
              "gte": "1719543600000",
              "lte": "1719584179015",
              "format": "epoch_millis"
            }
          }
        },
        {
          "query_string": {
            "query": "data.type:\"user-stats\""
          }
        }
      ]
    }
  },
  "aggs": {
    "group_by_userId": {
      "terms": {
        "field": "data.userId.keyword"
      },
      "aggs": {
        "users_last_status": {
          "top_hits": {
            "size": 1,
            "sort": [
              {
                "data.time": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  },
  "post_filter": { // In this query this filter is not working
    "term": {
      "data.status.keyword": "failure"
    }
  }
}

Actual Output:

[
  {
    "userId": "u1",
    "status": "Failure",
    "time": 1719543600008
  },
  {
    "userId": "u2",
    "status": "Timeout",
    "time": 1719543600004
  },
  {
    "userId": "u3", // This shouldn't come in output as we are concerned about only failure records.
    "status": "Success",
    "time": 1719543600002 
  }
]

Note: As there is no limit on number of users, we don't want to filter on application/client side to reduce load.

Upvotes: 0

Views: 120

Answers (1)

Musab Dogan
Musab Dogan

Reputation: 3680

post_filter only affects the query results and not the aggregations results.

Use the search API’s post_filter parameter. Search requests apply post filters only to search hits, not aggregations. You can use a post filter to calculate aggregations based on a broader result set, and then further narrow the results. https://www.elastic.co/guide/en/elasticsearch/reference/current/filter-search-results.html

You can use terms query in the query.bool.filter like the following.

{
  "query":{
    "bool":{
      "filter":[
        {"range":{"data.time":{"gte":"1719543600000","lte":"1719584179015","format":"epoch_millis"}}},
        {"query_string":{"query":"data.type:\"user-stats\""}},
        {"terms":{"status":["Timeout","Failure"]}}
      ]
    }
  },
  "aggs": {...}
}

Upvotes: 0

Related Questions