Reputation:
I have few read heavy indices(started seeing performance issues on these indices) in my ES cluster which has ~50 million docs and noticed most of them have around 25% of total documents as deleted, I know that these deleted document count decrease over time when background merge operation happens, But in my case these count is always around ~25% of total documents and I have below questions/concerns:
Thanks
Upvotes: 4
Views: 3790
Reputation: 7221
your deleted documents are still part of the index so they impact the search performance ( but I can't tell you if its a huge impact ).
For the periodic merge, Lucene is "reluctant" to merge heavy segments as it requires some disk space and generates a lot of IO.
You can get some precious insight on your segments thanks to the Index Segments API
If you have segments close to the 5GB limit, it is probable that they won't be merged automatically until they are mostly constituted with deleted docs.
You can force a merge on your index with the force merge API
Remember a force merge can generate some stress on a cluster for huge indices. An option exists to only delete documents, that should reduce the burden.
only_expunge_deletes (Optional, boolean) If true, only expunge segments containing document deletions. Defaults to false.
In Lucene, a document is not deleted from a segment; just marked as deleted. During a merge, a new segment is created that does not contain those document deletions.
Regards
Upvotes: 3