Reputation: 534
I have an index of documents connected with some product_id. And I would like to find all documents for specific ids (around 100 000 product_ids to be found and 100 million are in total in index).
Would the filter query be the fastest and best option in that case?
"query": {
"bool": {
"filter": {"terms": {"product_id": product_ids}
}
}
Or is it better to chunkify ids and use just terms query or smth else?
The question is probably kind of a duplicate, but I would be very grateful for the best practice advice (and a bit of reasoning).
Upvotes: 1
Views: 8993
Reputation: 534
After some testing and more reading I found an answer:
Filter query works much much faster as chunks with just terms query. But making really big filter can slower getting the result a lot. In my case, using filter query with chunks of 10 000 ids is 10 times faster, than using filter query with all 100 000 ids at once (btw, this number is already restricted in Elasticsearch 6).
Also from official elasticsearch documentation:
Potentially the amount of ids specified in the terms filter can be a lot. In this scenario it makes sense to use the terms filter’s terms lookup mechanism.
The only disadvantage to be taken into account is that filter query is stored in cache. (The cache implements an LRU eviction policy: when a cache becomes full, the least recently used data is evicted to make way for new data.)
P.S. In all cases I always used scroll.
Upvotes: 2
Reputation: 15296
you can use "paging
" or "scrolling
" feature of elastic search query for very large result sets.
Use "from - to
" query : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-from-size.html
or "scroll
" query:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
I think that "From / To" is a more efficient way to go unless you want to return thousands of results each time (which could be many many MB of data so you probably don't want that)
Edit:
You can make a query like this in bulks:
GET my_index/_search { "query": { "terms": { "_id": [ "1", "2", "3", .... "10000" ] // tune for the best array length } } }
If your document Id is sequential or some other number form that you could easily order by, and have a field available you can do a "range query
"
GET _search { "query": { "range" : { "document_id_that_is_a_number" : { "gte" : 0, // bump this on each query by "lte" step factor "lte" : 10000 // find a good number here } } } }
Upvotes: 0