Lucene.Net 3.0.5 - Near Real Time Search, reopening reader performance

Question

I'm indexing a sequence of documents with the IndexWriter and commiting the changes at the end of the iteration.

In order to access the uncommitted changes I'me using NRTS as described here

Imagine that I'm indexing 1000 documents and iterating through them to check if there's any I can reuse/update. (some specific requirements I have)

I'm reopening the reader at each iteration:

    using (var indexReader = writer.GetReader())
    using (var searcher = new IndexSearcher(indexReader))

How slow should it be to reopen the reader? Once the index gets to around 300K documents, Occasionally, indexing 1000 documents can take around 60 seconds (not much text)

Am I taking the wrong approach? Please advise.

Michael Gorsich · Accepted Answer

To increase your performance, you need to not optimize so often.

I use a separate timer for optimization. Every 40 minutes it enables optimization to five segments (a good value according to "Lucene In Action"), which then occurs if the indexer is running (there being no need to optimize if the indexer is shut down). Then, once a day, it enables optimization to one segment at a very low usage time of day. I usually see about 5 minutes for the one-segment optimization. Feel free to borrow my strategy, but in any case, don't optimize so often - your optimization is hurting your overall index rate, especially given that your document size is small, and so the 500 doc iteration loop must be happening frequently.

You could also put in some temporary logging code at the various stages to see where your indexer is spending its time so you can tweak iteration size, settling time between loops (if you're paranoid like me), optimization frequency, etc.

Lucene.Net 3.0.5 - Near Real Time Search, reopening reader performance

Answers (1)

Related Questions