eyal
eyal

Reputation: 379

Lucene.net optimize unfinished loop

I'm using Lucene.net version 2.9.1, and facing the following problem when calling Optimize: I've noticed that some calls to optimize can take hours, and when this take that long period, the process which indexing and optimizing isn't kill-able. When I used the source code, I managed to track the problem: the call which is causing this behavior is Optimize(int maxNumSegments, bool doWait) - and within this method there're repetitive calls to OptimizeMergesPending() which always return true, and the loop keep working and call this method until this call will return otherwise, which can take ages.

This raise the following questions:
1. What can cause OptimizeMergesPending() keep return true?
2. What can cause the failure of killing the process that indexes and optimizing?
3. Do you know if newer versions of Lucene.net face the same behavior?

Thanks

Upvotes: 0

Views: 239

Answers (1)

sisve
sisve

Reputation: 19781

The xmldocs for IndexWriter.OptimizeMergesPending states that it will return true "if any merges in pendingMerges or runningMerges are optimization merges". The inline documentation for IndexWriter.DoWait states that it will only wait for one second to avoid issues where some notifications may not be triggered, it's up to the caller to reevaluate the waiting conditions. I've linked to the 2.9.4g source code, so newer versions also contains this behavior.

An unkillable process is an operating system issue, you should always be able to kill a process as long as it isn't blocked in a kernel/system call. We would need to see process dumps to debug those issues. (Or a better explanation on how you're trying to kill the process...)

Counter-questions;

  1. Why are you calling IndexWriter.Optimize? Lucene can handle several segments, in fact, it's easier to reopen indexes when only a few segments have changed than to reopen a completely new segment containing the whole index. You could write your own MergePolicy if you have issues with the current handling of segments. It has been deprecated as of 3.5, which Lucene.Net currently lags behind (it's up to 3.0.3 at the moment, and porting of 4.x is in progress).
  2. Are you ever locking on your IndexWriter? The code I linked showed that the code did lock (this) {...} which is bad and may cause deadlock issues for you in case you lock on your writer too. This may appear as if your code hangs and any clean thread termination you may have built will not be triggered (since the thread just blocks).

Update regarding a continuously changing index.

  1. Never call IndexWriter.Optimize(), it will cause unnecessary cpu- and io load, both during the actual merge and when reopening your readers.
  2. Reopen your reader and searcher on a separate thread, or when calling IndexWriter.Commit. Do not wait until a user needs to search to reopen it.
  3. Call IndexReader.Reopen() instead of IndexReader.Open(). The first one will only load changed segments and reuse those already read and not changed. (And remember, deletions are just a separate bitmap, it will only re-read the bitmap and not the complete segment.)
  4. Consider upgrading to Lucene.Net 3.0.3 and using the IndexWriter.IndexReaderWarmer to write custom warmup logic to ensure your segments are fully read into cache/memory before users are starting to use it.

Upvotes: 4

Related Questions