Reputation: 1610
I want to get information about Lucene's index in a real-time manner during the indexing process. So I use CheckIndex class in my code, as follows:
CheckIndex.Status indexStatus = checkIndex.checkIndex();
Iterator<CheckIndex.Status.SegmentInfoStatus> iterator = indexStatus.segmentInfos.iterator();
CheckIndex.Status.SegmentInfoStatus temp = null;
while(iterator.hasNext()) {
temp = iterator.next();
term_number += temp.termIndexStatus.termCount;
index_MB_size += temp.sizeMB;
}
At first, the index folder is empty. I output term_number
and index_MB_size
each time when Lucene finishes indexing a text file(about 10MB each, all about 600MB). So I get about 60 pairs of result. But I am sad to find that the 2 variables in 60 pairs are always 0. Only when the index segment is finished, the statistical information will show a non-zero result.
I guess this is caused by the fact that the CheckIndex class can only work correctly after indexing is finished. But I haven't checked.
How can I get the information in a real-time manner? In addition, the CheckIndex process is really time-consuming, is there any other better way to get the two aspects of information (index size and term number)?
Upvotes: 3
Views: 728
Reputation: 1610
I finally find the answer in the book Lucene In Action 2nd Edition.
It is because that IndexReader
and CheckIndex
can only see the changes to the index after the commit()
method has been called by the IndexWriter
. Commit is very different from Flush, since flush only flush all the buffered data to the disk, while Commit
will do flush first and then make all the changes visible to IndexReader
.
Upvotes: 1