Alex T
Alex T

Reputation: 2195

Aggregation with OR filter

I have an index of jBPM usertasks. Each usertask has id, actualOwner and actualOwner.keyword, potentialOwners and potentialOwners.keyword, and processId fields.

I need this aggregation: count of usertasks per process for some user. User task belongs to user if it's actualOwner or potentialOwner is that user. And in some statuses.

Seems that is very easy query, but I cannot write it. I try this:

POST jbpm-tasks/_search?size=1000
{
  "query": {
    "bool": {
      "should": [
        { 
          "term": {
            "actualOwner.keyword": {
              "value": "vasya"
            }
          }
        },
        {
          "term": {
            "potentialOwners.keyword": {
              "value": "vasya"
            }
          }
        }
      ],
      "must": [
        {
          "terms": {
            "status.keyword": [
              "Created",
              "Reserved",
              "InProgress",
              "Ready"
            ]
          }
        }
      ]
    }
  },
  "aggs": {
    "byProcesses": {
      "terms": { "field": "processId.keyword", "size": 1000000 } 
    }
  }
}

Query as I understand does not filter, it calculate score only. So, for this query really not matter user login, for vasya and. for example, fedya request returns same aggregation results (with different scores for tasks, but it is not interesting for now).

I not understand how write filter aggregation for multiple filter with OR and AND formula. In all examples always one field...

How to write this query properly?

Upvotes: 1

Views: 79

Answers (2)

Hieast
Hieast

Reputation: 87

Which version do you use? Different versions have different grammar.

You are right in comment, minimum_should_match=1 is the way. If the bool query includes no must or filter clauses, the default minimum_should_match value is 1, the query behaves like you originally understood.

Otherwise, the default minimum_should_match value is 0, which makes should clauses different from OR, should clauses doesn't filter any document.

POST jbpm-tasks/_search?size=1000
{
  "query": {
    "bool": {
      "should": [
        { 
          "term": {
            "actualOwner.keyword": {
              "value": "vasya"
            }
          }
        },
        {
          "term": {
            "potentialOwners.keyword": {
              "value": "vasya"
            }
          }
        }
      ],
      "must": [
        {
          "terms": {
            "status.keyword": [
              "Created",
              "Reserved",
              "InProgress",
              "Ready"
            ]
          }
        },
      ],
      "minimum_should_match" : 1
    }
  },
  "aggs": {
    "byProcesses": {
      "terms": { "field": "processId.keyword", "size": 1000000 } 
    }
  }
}

Upvotes: 0

Saeed Nasehi
Saeed Nasehi

Reputation: 1000

If you don't need to sort your data by score of relevancy, it would be better to use filter. Besides, if you just need to use data from aggregation part, it is wise to set size:0 to make elasticsearch to not to return any posts, and just return the aggs result. in my view the below query can have better performance for you:

{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "status.keyword": [
              "Created",
              "Reserved",
              "InProgress",
              "Ready"
            ]
          }
        },
        {
          "bool": {
            "should": [
              {
                "term": {
                  "actualOwner.keyword": {
                    "value": "vasya"
                  }
                }
              },
              {
                "term": {
                  "potentialOwners.keyword": {
                    "value": "vasya"
                  }
                }
              }
            ],
            "minimum_should_match": 1
          }
        }
      ]
    }
  },
  "aggs": {
    "byProcesses": {
      "terms": {
        "field": "processId.keyword",
        "size": 1000000
      }
    }
  }

Overall, setting large number into size of aggregations makes it very heavy. If you want to go through all your data comes from aggregations, it would be a good option if you use composite aggregations link.

Upvotes: 2

Related Questions