Igor Benikov
Igor Benikov

Reputation: 897

Elasticsearch index is taking up too much disk space

I have index in production with 1 replica (this takes total ~ 1TB). Into this index every time coming new data (a lot of updates and creates). When i have created the copy of this index - by running _reindex(with the same data and 1 replica as well) - the new index takes 600 GB. Looks like there is a lot of junk and some kind of logs in original index which possible to cleanup. But not sure how to do it.

The questions: how to cleanup the index (without _reindex), why this is happening and how to prevent for it in the future?

Upvotes: 1

Views: 2776

Answers (1)

ilvar
ilvar

Reputation: 5841

Lucene segment files are immutable so when you delete or update (since it can't update doc in place) a document, old version is just marked deleted but not actually removed from disk. ES runs merge operation periodically to "defragment" the data but you can also trigger merge manually with _forcemerge (try running with only_expunge_deletes as well: it might be faster).

Also, make sure your shards are sized correctly and use ILM rollover to keep index size under control.

Upvotes: 2

Related Questions