Aggregation with OR filter

Question

I have an index of jBPM usertasks. Each usertask has id, actualOwner and actualOwner.keyword, potentialOwners and potentialOwners.keyword, and processId fields.

I need this aggregation: count of usertasks per process for some user. User task belongs to user if it's actualOwner or potentialOwner is that user. And in some statuses.

Seems that is very easy query, but I cannot write it. I try this:

POST jbpm-tasks/_search?size=1000
{
  "query": {
    "bool": {
      "should": [
        { 
          "term": {
            "actualOwner.keyword": {
              "value": "vasya"
            }
          }
        },
        {
          "term": {
            "potentialOwners.keyword": {
              "value": "vasya"
            }
          }
        }
      ],
      "must": [
        {
          "terms": {
            "status.keyword": [
              "Created",
              "Reserved",
              "InProgress",
              "Ready"
            ]
          }
        }
      ]
    }
  },
  "aggs": {
    "byProcesses": {
      "terms": { "field": "processId.keyword", "size": 1000000 } 
    }
  }
}

Query as I understand does not filter, it calculate score only. So, for this query really not matter user login, for vasya and. for example, fedya request returns same aggregation results (with different scores for tasks, but it is not interesting for now).

I not understand how write filter aggregation for multiple filter with OR and AND formula. In all examples always one field...

How to write this query properly?

Saeed Nasehi · Accepted Answer

If you don't need to sort your data by score of relevancy, it would be better to use filter. Besides, if you just need to use data from aggregation part, it is wise to set size:0 to make elasticsearch to not to return any posts, and just return the aggs result. in my view the below query can have better performance for you:

{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "status.keyword": [
              "Created",
              "Reserved",
              "InProgress",
              "Ready"
            ]
          }
        },
        {
          "bool": {
            "should": [
              {
                "term": {
                  "actualOwner.keyword": {
                    "value": "vasya"
                  }
                }
              },
              {
                "term": {
                  "potentialOwners.keyword": {
                    "value": "vasya"
                  }
                }
              }
            ],
            "minimum_should_match": 1
          }
        }
      ]
    }
  },
  "aggs": {
    "byProcesses": {
      "terms": {
        "field": "processId.keyword",
        "size": 1000000
      }
    }
  }

Overall, setting large number into size of aggregations makes it very heavy. If you want to go through all your data comes from aggregations, it would be a good option if you use composite aggregations link.

Aggregation with OR filter

Answers (2)

Related Questions