Roey
Roey

Reputation: 849

Lucene AddIndexes (merge) - how to avoid duplicates?

How do I make sure that when I merge a few temp indexes (that might or might not contain duplicate documents) I end up with one copy in the main index ?

Thanks

Upvotes: 2

Views: 1470

Answers (1)

Yuval F
Yuval F

Reputation: 20621

Here's a way: Provided that each document has an id, and that duplicate documents have the same id:

mark the indexes by I1..Im.
for i in 1..m, let Ci = all the indexes but Ii
  for all the documents Dj in Ii,
  let cur_term = "id:<Dj's id>"
  for Ik in Ci
    Ik.deleteDocuments(cur_term)
merge all indexes

The gist is: delete all documents having the same id as the current document from the other indexes. After having done this for all indexes, merge them. I know this is not elegant, but I do not know a better algorithm.

Upvotes: 1

Related Questions