Reputation: 2437
I use Lucene 5.3.1 and I've already indexed some documents and now am trying to find a built-in function to count all tokens count (across the collection/index)
I know that I can iterate over all documents and make a sum on their length. But because of my complex algorithms that increases run time, I'm trying to avoid this approach. I think lucene maybe have an api for this...
After all, I googled this function (or any similar function), But I cannot find any useful link.
Now the question is: Is there any built-in function which returns number of ALL TOKENS in collection (i.e. whole index) ?? If not, Is there any other optimum approach?
Any help is appreciated, thanks.
Upvotes: 0
Views: 206
Reputation: 2437
Eventually I found the solution.
I use CollectionStatistics
in the following way:
CollectionStatistics collectionStats = indexSearcher.collectionStatistics("Body");
long token_count = collectionStats.sumTotalTermFreq();
sumTotalTermFreq()
method returns ALL TOKENS in the collection. It's fix for any query.
Upvotes: 1