user12056260
user12056260

Reputation:

Does huge number of deleted doc count affects ES query performance

I have few read heavy indices(started seeing performance issues on these indices) in my ES cluster which has ~50 million docs and noticed most of them have around 25% of total documents as deleted, I know that these deleted document count decrease over time when background merge operation happens, But in my case these count is always around ~25% of total documents and I have below questions/concerns:

  1. Will these huge no of deleted count affects the search performance as they are still part of lucene immutable segments and search happens to all the segments and latest version of document is returned, so size of immutable segments would be high as they contains huge number of deleted docs and then another operation to figure out the latest version of doc.
  2. Will periodic merge operation would take lot of time and inefficient if huge number of deleted documents are there?
  3. is there is any way to delete these huge number of deleted docs in one shot as looks like background merge operation is not able to keep up with huge number?

Thanks

Upvotes: 4

Views: 3790

Answers (1)

Pierre Mallet
Pierre Mallet

Reputation: 7221

your deleted documents are still part of the index so they impact the search performance ( but I can't tell you if its a huge impact ).

For the periodic merge, Lucene is "reluctant" to merge heavy segments as it requires some disk space and generates a lot of IO.

You can get some precious insight on your segments thanks to the Index Segments API

If you have segments close to the 5GB limit, it is probable that they won't be merged automatically until they are mostly constituted with deleted docs.

You can force a merge on your index with the force merge API

Remember a force merge can generate some stress on a cluster for huge indices. An option exists to only delete documents, that should reduce the burden.

only_expunge_deletes (Optional, boolean) If true, only expunge segments containing document deletions. Defaults to false.

In Lucene, a document is not deleted from a segment; just marked as deleted. During a merge, a new segment is created that does not contain those document deletions.

Regards

Upvotes: 3

Related Questions