Alpcan Yıldız
Alpcan Yıldız

Reputation: 751

ElasticSearch Scroll API Connection time

We are using Elasticsearch 6.8 version. I just want to use Scroll API (https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-scroll.html) with scroll=1m connection time. (1m is an example, what I am asking is the maximum value for the x minute or hour..)

What I am wondering is this scroll connection time. If I request with the scrollId the connection time resets but what is the maximum time of it or is it bad to keep connection very long ?

I want to use scrollId with 1-10 million of records and export my documents as batches each 1 minute. Anyway, if my system is down somehow, I want to continue where I stopped, so I want to use my conection as long as possible if it does not use extra extra memory or cpu etc.. What is the maximum time that I can keep connection alive and what it should be? Or should it be ?

Thanks !

Upvotes: 1

Views: 1765

Answers (1)

jaspreet chahal
jaspreet chahal

Reputation: 9109

Max value to keep scroll context alive is 24h(24 hours). This limit can be changed by setting the "search.max_keep_alive" cluster setting.

Setting large value can increase the load of the shards.

From documentation

Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of one index into a new index with a different configuration

From documentation

Normally, the background merge process optimizes the index by merging together smaller segments to create new bigger segments, at which time the smaller segments are deleted. This process continues during scrolling, but an open search context prevents the old segments from being deleted while they are still in use. This is how Elasticsearch is able to return the results of the initial search request, regardless of subsequent changes to documents.

From documentation

Search context are automatically removed when the scroll timeout has been exceeded. However keeping scrolls open has a cost, as discussed in the previous section so scrolls should be explicitly cleared as soon as the scroll is not being used anymore using the clear-scroll API:

Upvotes: 1

Related Questions