Reputation: 897
I have index in production with 1 replica (this takes total ~ 1TB). Into this index every time coming new data (a lot of updates and creates).
When i have created the copy of this index - by running _reindex
(with the same data and 1 replica as well) - the new index takes 600 GB.
Looks like there is a lot of junk and some kind of logs in original index which possible to cleanup. But not sure how to do it.
The questions: how to cleanup the index (without _reindex
), why this is happening and how to prevent for it in the future?
Upvotes: 1
Views: 2776
Reputation: 5841
Lucene segment files are immutable so when you delete or update (since it can't update doc in place) a document, old version is just marked deleted
but not actually removed from disk. ES runs merge
operation periodically to "defragment" the data but you can also trigger merge manually with _forcemerge (try running with only_expunge_deletes
as well: it might be faster).
Also, make sure your shards are sized correctly and use ILM rollover to keep index size under control.
Upvotes: 2