Reputation: 171
I am creating an index using Lucene .Net 2.9.2. After a lot of indexing, the index has many segments and deleted documents, so I am calling Optimize(numSegmets) on the IndexWriter.
The index's segments count is indeed reduced to the value of numSegmets, but it still has deletions... doesnt a call to Optimize should also remove all deleted documents?
My question is very important so I could know if this is how Lucene works or maybe I have some bug...
Edit: here is my code snippet:
IndexWriter writer = new IndexWriter(/*open writer from index directroy*/);
writer.Optimize(5);
writer.Commit();
bool hasDeletions = writer.HasDeletions();
hasDeletions is true, while I was expecting it would be false...
Upvotes: 2
Views: 2327
Reputation: 11
Optimize seems to be deleting the entire index?
I am new to Lucene.NET - but I have it wired up and everything seems great! I added test data, removed items, and then tried to both optimize(1) and the ExpungeDeletes() (as seen above)...
but no matter how I approach this ... its not merging or whatever -- its just deleting the entire index?
my code looks like this (got it from a sample online):
public void Optimize()
{
analyzer = new StandardAnalyzer(Version.LUCENE_30);
using (var writer = new IndexWriter(_directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED))
{
analyzer.Close();
//writer.Optimize(1);
writer.ExpungeDeletes();
writer.Dispose();
}
}
I have no idea why this would delete the entire index?
Upvotes: 1
Reputation: 30695
Optimization merges segments, and during the segment merging, it removes the deletions that are listed in each one. If you don't do a full optimization, it's possible for deletions to remain, since the segments aren't merged/rebuilt.
This doesn't mean you need to do a full optimization in order to remove deletions.
IndexWriter writer = GetIndexWriter();
// delete some stuff
writer.ExpungeDeletes();
That will remove deleted documents from your index without doing a full optimization. It generally takes less time than an optimization, though it does depend on the MergePolicy
, since it might still merge all the segments together (I believe by default it does not do this).
Upvotes: 2
Reputation: 9964
Deletions can remain unless you provide 1 as the maximum number of segments.
But you shouldn't worry about this. To quote the documentation for IndexWriter#optimize in Lucene 3.5
This method has been deprecated, as it is horribly inefficient and very rarely justified. Lucene's multi-segment search performance has improved over time, and the default TieredMergePolicy now targets segments with deletions.
Upvotes: 5