Reputation: 1085
In web search engine, the inverted index is usually very large, so the search engine will quit searching when getting enough results. Since traversing to the tail of a long inverted index is time consuming.
How does Lucene handle this case?
For example, if an inverted index of term 'A' consists of 10000 documents, when searching 'A' for 10 results, will Lucene go through all these 10000 documents then return 10 results, or return 10 results when retrieved enough results even if it does not reach the end of inverted index?
Upvotes: 0
Views: 239
Reputation: 9964
Lucene will indeed visit all 10k matches, compute the score for each of those matches and put then in a heap in order to compute the top k hits.
The lucene/misc module has a SortingMergePolicy which allows you to sort merged segments based on a certain field (on a web index, this could be the page rank for instance). This way, if you want to sort documents based on this field at search time (or more generally if the sort order is strongly correlated to the value of this field), you can stop collecting documents per segment as soon as you collected enough matches.
This is currently a very expert feature, but we have plans to make it easier to use, see https://issues.apache.org/jira/browse/LUCENE-6766.
Upvotes: 2