Reputation: 1911
I am trying work out if there is a difference between "filters" and "filtered queries" in Elasticsearch.
The two example requests below return the same results, when run against my index.
Are they actually different in some subtle way?
Is there a reason why one would be preferred over the other, in different situations?
DSL giving one top-level query
, and one top-level filter
:
GET /index/type/_search?_source
{
"query": {
"multi_match": {
"query": "my dog has fleas",
"fields": ["name", "keywords"]
}
},
"filter": {
"term": {"status": 2}
}
}
DSL giving only a top-level query
, using the filtered
construct:
GET /index/type/_search?_source
{
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "my dog has fleas",
"fields": ["name", "keywords"]
}
},
"filter": {
"term": {"status": 2}
}
}
}
}
Upvotes: 7
Views: 11645
Reputation: 1124
Later versions of Elasticsearch have a filter
clause in the bool query. This will not actually run the filter before the query necessarily, the overall query will get rewritten and optimized as Elasticsearch sees fit (there's no real control on the user's end).
Actually, the only way to control is to use that post_filter
, which runs only on results of the query. This will only work (performance-wise) if the filter is very expensive and the query is cheap. Or if you want that filter not to influence aggregations (as aggregations only run on the results of the query). Some E-commerce searches would use this to e.g. filter stock products if that's what you select, but show both stock and non-stock in the aggregations.
If you need more info on Elasticsearch query-building and/or performance, feel free to check out our Elasticsearch training (disclaimer: I'm one of the instructors).
Upvotes: -1
Reputation: 62648
The first example is a post_filter, which is sub-optimal from a performance perspective. Filtered queries are preferred, since the filters will be run prior to the queries. Typically, you want your filters to run first, since scoring documents is more expensive than just a boolean pass/fail. That way, your result set is cut down before you run your query on it. With a post_filter, your query is run first, the entire result set is scored, and then the filter is applied to the results.
The top-level filter
directive was deprecated in 1.0, and was renamed to post_filter
to clarify its purpose and usage.
the top-level filter parameter in search has been renamed to post_filter, to indicate that it should not be used as the primary way to filter search results (use a filtered query instead), but only to filter results AFTER facets/aggregations have been calculated.
http://www.elastic.co/guide/en/elasticsearch/reference/current/_search_requests.html
Upvotes: 18