Yuhao
Yuhao

Reputation: 1610

Lucene: How to get real-time index information during indexing? (Index size and term count)

I want to get information about Lucene's index in a real-time manner during the indexing process. So I use CheckIndex class in my code, as follows:

CheckIndex.Status indexStatus = checkIndex.checkIndex();
Iterator<CheckIndex.Status.SegmentInfoStatus> iterator = indexStatus.segmentInfos.iterator();
CheckIndex.Status.SegmentInfoStatus temp = null;
while(iterator.hasNext()) {
    temp = iterator.next();
    term_number += temp.termIndexStatus.termCount;
    index_MB_size += temp.sizeMB;
}

At first, the index folder is empty. I output term_number and index_MB_size each time when Lucene finishes indexing a text file(about 10MB each, all about 600MB). So I get about 60 pairs of result. But I am sad to find that the 2 variables in 60 pairs are always 0. Only when the index segment is finished, the statistical information will show a non-zero result.

I guess this is caused by the fact that the CheckIndex class can only work correctly after indexing is finished. But I haven't checked.

How can I get the information in a real-time manner? In addition, the CheckIndex process is really time-consuming, is there any other better way to get the two aspects of information (index size and term number)?

Upvotes: 3

Views: 728

Answers (1)

Yuhao
Yuhao

Reputation: 1610

I finally find the answer in the book Lucene In Action 2nd Edition.

It is because that IndexReader and CheckIndex can only see the changes to the index after the commit() method has been called by the IndexWriter. Commit is very different from Flush, since flush only flush all the buffered data to the disk, while Commit will do flush first and then make all the changes visible to IndexReader.

Upvotes: 1

Related Questions