Reputation: 637
How can i get all the results from elastic search as the results only display limit to 10 only. ihave got a query like:
@data = Athlete.search :load => true do
size 15
query do
boolean do
must { string q, {:fields => ["name", "other_names", "nickname", "short_name"], :phrase_slop => 5} }
unless conditions.blank?
conditions.each do |condition|
must { eval(condition) }
end
end
unless excludes.blank?
excludes.each do |exclude|
must_not { eval(exclude) }
end
end
end
end
sort do
by '_score', "desc"
end
end
i have set the limit to 15 but i wan't to make it unlimited so that i can get all the data I can't set the limit as my data keeps on changing and i want to get all the data.
Upvotes: 56
Views: 119340
Reputation: 6412
You can use search_after to paginate, and the Point in Time API to avoid having your data change while you paginate. Example with elasticsearch-dsl for Python:
from elasticsearch_dsl.connections import connections
# Set up paginated query with search_after and a fixed point_in_time
elasticsearch = connections.create_connection(hosts=[elastic_host])
pit = elasticsearch.open_point_in_time(index=MY_INDEX, keep_alive="3m")
pit_id = pit["id"]
query_size = 500
search_after = [0]
hits: List[AttrDict[str, Any]] = []
while query_size:
if hits:
search_after = hits[-1].meta.sort
search = (
Search()
.extra(size=query_size)
.extra(pit={"id": pit_id, "keep_alive": "5m"})
.extra(search_after=search_after)
.filter(filter_)
.sort("url.keyword") # Note you need a unique field to sort on or it may never advance
)
response = search.execute()
hits = [hit for hit in response]
pit_id = response.pit_id
query_size = len(hits)
for hit in hits:
# Do work with hits
Upvotes: 0
Reputation: 28553
use the scan method e.g.
curl -XGET 'localhost:9200/_search?search_type=scan&scroll=10m&size=50' -d '
{
"query" : {
"match_all" : {}
}
}
see here
Upvotes: 7
Reputation: 8408
From the docs, "Note that from + size
can not be more than the index.max_result_window
index setting which defaults to 10,000". So my admittedly very ad-hoc solution is to just pass size: 10000
or 10,000 minus from if I use the from
argument.
Note that following Matt's comment below, the proper way to do this if you have a larger amount of documents is to use the scroll api. I have used this successfully, but only with the python interface.
Upvotes: 10
Reputation: 550
Another approach is to first do a searchType: 'count'
, then and then do a normal search with size
set to results.count
.
The advantage here is it avoids depending on a magic number for UPPER_BOUND
as suggested in this similar SO question, and avoids the extra overhead of building too large of a priority queue that Shay Banon describes here. It also lets you keep your results sorted, unlike scan
.
The biggest disadvantage is that it requires two requests. Depending on your circumstance, this may be acceptable.
Upvotes: 13
Reputation: 9721
You can use the from
and size
parameters to page through all your data. This could be very slow depending on your data and how much is in the index.
http://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-from-size.html
Upvotes: 36