Reputation: 135
I am running ES query step by step for different offset and limit. For example 100 to 149, then 150 to 199, then 200 to 249.. and so on. When I keep offset+limit more than 10,000 then getting below exception:
{
"error": {
"root_cause": [
{
"type": "query_phase_execution_exception",
"reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [10001]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "xyz",
"node": "123",
"reason": {
"type": "query_phase_execution_exception",
"reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [10001]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter."
}
}
]
},
"status": 500
}
I know we can solve this by increasing the "max_result_window". I tried it and it helped too. I increased it to 15,000 then 30,000. But I am not allowed to make index level changes. So, I changed it back to default one 10,000.
How can I solve this problem? This query is getting hit by an API call.
Upvotes: 1
Views: 4591
Reputation: 135
There are two approach which worked for me-
First approach was applied using below
PUT /index/_settings
{ "max_result_window" : 10000 }
This worked and solved my problem, but number of records is dynamic element and increasing very fast. So, it is not good to keep increasing this window. Also in my case we use index on sharing basis. So,this change will effect all the users or group on this shared index. So, we moved on to second approach.
Second approach Part1: First I applied filter on last update timestamp and if record count is greater than 10K then I divide the time frame by half and keep doing it until it reaches count less than 10k.
Part2: As same data is also available in OLTP, I got the complete list of a unique identifier and sorted it. Then applied filter on that identifier and only fetched data in range of 10K. Once 10K data is fetched using pagination, then change the filter and move to next batch of 10k data.
Part3: Applied sorting on last updated timestamp and started fetching data using pagination. Once record count reaches 10k, get the timestamp of 9999 record and apply greater_than filter on identifier and then fetch next 10k records.
All mentioned solution helped me. But I selected the Part3 of second approach. As it is easy to implement and give a sorted data quickly.
Upvotes: 1
Reputation: 1336
Consider scroll API - https://www.elastic.co/guide/en/elasticsearch/reference/2.2/search-request-scroll.html
This is also suggested in manual
Upvotes: 1