Reputation: 1684
In our project, we are hitting the elastic search's index refresh api after each create/update/delete operation for immediate search availability.
I want to know, how elastic search will perform if multiple parallel requests are made to its refresh api on single index having close to 2.5million documents?
any thoughts? suggestions?
Upvotes: 0
Views: 683
Reputation: 7221
Refresh is an operation where ElasticSearch asks Lucene shard to commit modification on disk and create a segment. If you ask for a refresh after every operation you will create a huge number of micro-segments.
Too many segments make your search longer as your shard need to sequentially search through all of them in order to return a search result. Also, they consume hardware resources.
Each segment consumes file handles, memory, and CPU cycles. More important, every search request has to check every segment in turn; the more segments there are, the slower the search will be. from the definitive guide
Lucene will merge those segments automatically into bigger segments, but that's also an I/O consuming task.
You can check this for more details
But from my knowledge, a refresh on a 2.5 billion documents index will take the same time in a 2.5k document index. Also, it seems ( from this issue ) that refresh is a non-blocking operation.
But its a bad pattern for an elasticsearch cluster. Are every CUD operation of your application in need for a refresh ?
Upvotes: 1