Lim H.
Lim H.

Reputation: 10060

What exactly does -1 refresh_interval in Elasticsearch mean?

I have read a lot of articles about index refreshing in Elasticsearch. I understand the implication of different intervals that are greater than 0, which is the elapsed time between consecutive segments flush, making them available for search. However, I am not sure what refresh_interval: -1 does exactly. In my understanding, it's a means to disable automatic index refreshing but not completely. Elasticsearch still flushes segments from time to time even though the refresh_interval is set to -1. I wonder which mechanism governs this flushing activity if automatic refresh is disabled.

Sorry I know I don't have a lot of code to post, so I will give a bit of background into what I am after. My application doesn't need near real-time search; it only needs eventual consistency. However, this eventuality should be reasonable, i.e. within a few seconds to less than a minute, not half an hour. I was wondering if I can leave it to Elasticsearch to decide when best to refresh at its convenience rather than refreshing at a regular interval. The reason is because disabling automatic refreshing does bring some benefits in terms of performance to my application, e.g. JVM Heap Size usage rises less aggressively in between garbage collection interval (see graph below)

After disabling refresh interval, heap usage rises less aggressively

Upvotes: 28

Views: 37061

Answers (2)

Andrei Stefan
Andrei Stefan

Reputation: 52366

There is a bit of confusion in your understanding. Refreshing the index and writing to disk are two different processes and are not necessarily related, thus your observation about segments still being written even if the refresh_interval is -1.

When a document is indexed, it is added to the in-memory buffer and appended to the translog file. When a refresh takes place the docs in the buffer are written to a new segment, without an fsync, the segment is opened to make it visible to search and the buffer is cleared. The translog is not yet cleared and nothing is actually persisted to disk (as there was no fsync).

Now imagine the refresh is not happening: there is no index refresh, you cannot search your documents, the segments are not created in cache.

The settings here will dictate when the flush (writing to disk) happens. By default when the translog reaches 512mb in size, or after 30 minutes. This is actually persisting data on disk, everything else is in filesystem cache (if the node dies or the machine is rebooted the cache is lost and the translog is the only salvation).

Upvotes: 49

Sachin
Sachin

Reputation: 1715

By default, index.refresh_interval is set to 1s. Actually this is something can be termed as an expensive operation in ES especially when indexing. You can note that when you increase the refresh_interval.

By setting index.refresh_interval to -1 means that you are disabling it and that can give you a significant gain when indexing to ES. You just need to disable refresh_interval (enable it again when you finish indexing data)

curl -XPUT "http://localhost:9200/$INDEX_NAME/_settings" -d '{ "index" : { "refresh_interval" : "-1"  }}'

#index data......

curl -XPUT "http://localhost:9200/$INDEX_NAME/_settings" -d '{ "index" : { "refresh_interval" : "1s"  }}'

And you may set an appropriate value according to your requirement after indexing to ensure consistency. A useful article :-https://sematext.com/blog/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/

Hope it helps!

Upvotes: 9

Related Questions