Reputation: 2195
I have an index of jBPM usertasks. Each usertask has id
, actualOwner
and actualOwner.keyword
, potentialOwners
and potentialOwners.keyword
, and processId
fields.
I need this aggregation: count of usertasks per process for some user. User task belongs to user if it's actualOwner
or potentialOwner
is that user. And in some statuses.
Seems that is very easy query, but I cannot write it. I try this:
POST jbpm-tasks/_search?size=1000
{
"query": {
"bool": {
"should": [
{
"term": {
"actualOwner.keyword": {
"value": "vasya"
}
}
},
{
"term": {
"potentialOwners.keyword": {
"value": "vasya"
}
}
}
],
"must": [
{
"terms": {
"status.keyword": [
"Created",
"Reserved",
"InProgress",
"Ready"
]
}
}
]
}
},
"aggs": {
"byProcesses": {
"terms": { "field": "processId.keyword", "size": 1000000 }
}
}
}
Query as I understand does not filter, it calculate score only. So, for this query really not matter user login, for vasya
and. for example, fedya
request returns same aggregation results (with different scores for tasks, but it is not interesting for now).
I not understand how write filter aggregation for multiple filter with OR
and AND
formula. In all examples always one field...
How to write this query properly?
Upvotes: 1
Views: 79
Reputation: 87
Which version do you use? Different versions have different grammar.
You are right in comment, minimum_should_match=1
is the way.
If the bool query includes no must or filter clauses, the default minimum_should_match
value is 1, the query behaves like you originally understood.
Otherwise, the default minimum_should_match
value is 0, which makes should
clauses different from OR
, should
clauses doesn't filter any document.
POST jbpm-tasks/_search?size=1000
{
"query": {
"bool": {
"should": [
{
"term": {
"actualOwner.keyword": {
"value": "vasya"
}
}
},
{
"term": {
"potentialOwners.keyword": {
"value": "vasya"
}
}
}
],
"must": [
{
"terms": {
"status.keyword": [
"Created",
"Reserved",
"InProgress",
"Ready"
]
}
},
],
"minimum_should_match" : 1
}
},
"aggs": {
"byProcesses": {
"terms": { "field": "processId.keyword", "size": 1000000 }
}
}
}
Upvotes: 0
Reputation: 1000
If you don't need to sort your data by score of relevancy, it would be better to use filter
. Besides, if you just need to use data from aggregation part, it is wise to set size:0
to make elasticsearch to not to return any posts, and just return the aggs result. in my view the below query can have better performance for you:
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"terms": {
"status.keyword": [
"Created",
"Reserved",
"InProgress",
"Ready"
]
}
},
{
"bool": {
"should": [
{
"term": {
"actualOwner.keyword": {
"value": "vasya"
}
}
},
{
"term": {
"potentialOwners.keyword": {
"value": "vasya"
}
}
}
],
"minimum_should_match": 1
}
}
]
}
},
"aggs": {
"byProcesses": {
"terms": {
"field": "processId.keyword",
"size": 1000000
}
}
}
Overall, setting large number into size of aggregations makes it very heavy. If you want to go through all your data comes from aggregations, it would be a good option if you use composite aggregations link.
Upvotes: 2