Reputation: 1007
OK, this this one will probably not be too hard for one of you super awesome ElasticSearch experts out there. I've got this nested query, and I want the nested query to be filtered on a non-nested field (status). I don't know where to put the filter. I tried putting it in a query (below) but that's not giving me the right results. Can you help me out?
{
"aggs": {
"status": {
"terms": {
"field": "status",
"size": 0
}
}
},
"filter": {
"nested": {
"path": "participants",
"filter": {
"bool": {
"must": [
{
"term": {
"user_id": 1
}
},
{
"term": {
"archived": false
}
},
{
"term": {
"has_unread": true
}
}
]
}
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must_not": [
{
"term": {
"status": 8
}
}
]
}
}
}
}
}
Upvotes: 8
Views: 17925
Reputation: 22342
There are a couple moving pieces here:
The top-level filter
that you are using is a "post filter", which is intended to remove things after the aggregation(s) have processed. It's rather annoying that it exists that way, but it was deprecated back in the 0.90 days and it will be removed entirely in Elasticsearch 5.0.
You will most likely get better performance by putting it inside of the filtered query, not to mention it sounds like that is your goal anyway.
post_filter
.Your nested
filter's terms are not using the full path to the field, which you should be doing.
{
"term": {
"user_id": 1
}
}
Should be:
{
"term": {
"participants.user_id": 1
}
}
The same follows for the rest of the nested objects.
Assuming you don't want the status
to be 8
, then you're doing that perfectly.
Using a size
of 0
in the aggregation means that you are going to get everything back. This works fine with a smaller data set, but this would be painful on a larger one.
Putting it all together (order is irrelevant, but it's generally a good idea to put aggregations after the query portion because that's how it is executed):
{
"query": {
"filtered": {
"filter": {
"bool": {
"must" : {
"nested" : {
"path" : "participants",
"filter": {
"bool": {
"must": [
{
"term": {
"participants.user_id": 1
}
},
{
"term": {
"participants.archived": false
}
},
{
"term": {
"participants.has_unread": true
}
}
]
}
}
}
},
"must_not": {
"term": {
"status": 8
}
}
}
}
}
},
"aggs": {
"status": {
"terms": {
"field": "status",
"size": 0
}
}
}
}
Note: I changed the "must_not" part from an array to a single object. There's nothing wrong with always using the array syntax, but I just did not to show that both formats work. Naturally, if you use more than one item, then you must use the array syntax.
Upvotes: 19