Reputation: 11
We have the following Scenario:
At first we handle the deletion of the documents and afterwards the update appears. FYI, it can happen, that we delete a document which will be indexed again some minutes by the updater again.
My Question now: If ES marks a document (ID:D123) as deleted in a segment (lets say A), but afterwards a document with the same ID (ID:D123) gets indexed into another segment (B), the document should be searchable. BUT, what happens if the segment merge occurs?
Segment B will be merged into Segment A which contains the delete flag for the same document ID (ID:D123).
After the merge, does the document still have the delete flag? I know, if a segment gets merged the deleted documents are not merged. But, does it matter which way around the merge happens? Segment A into B or B into A?
We lose some documents with this scenario and still cannot find out why.
For a short term solution, I filter out the documents to be deleted after reindexing.
I'd like to understand the whole process. It seems doesn't consistent at all!
Thanks
Upvotes: 1
Views: 3257
Reputation: 592
Lucene's segment merging is the creation of a new segment with the content of previous segments, but without deleted or outdated documents. So, using your example, it will be created a new segment C with the content from segments A and B, in this order but filtering out the deleted documents of the new segment. Also, each commit creates a new segment and they have generations (1, 2, ...). Therefore, each segment is a snapshot of a time interval between commits and it doesn't make sense to first read B and then A during merge because inserts + deletes of same document are not commutative, and we would be going "backwards" in time. Therefore, you effectively updated document ID:D123 by deleting and inserting a new document with same ID. There is no really update in Lucene's indexes: it is a delete followed by an insert.
Upvotes: 0