Reputation: 1110
Does anyone know the performance impact of letting Lucene (or Solr) return very long result sets instead of just the usual "top 10". We would like to return all results (which can be around 100.000 documents) from a user search and then post-process the returned document ids before returning the actual result.
Our current index contains about 10-20 million documents.
Upvotes: 3
Views: 2350
Reputation: 21
I was able to get 100,000 rows back in 2.5 sec with 27 million documents indexed (each doc has 1k bytes with about 600B of text fields). The hardware is not ordinary it had 128 GB of RAM. Memory usage by Solr was like this: Res was 50GB Virt was 106GB.
I started seeing performance degradation after going to 80 million documents. Currently looking to investigate how to match the hardware to the problem. Hope that helps you.
Upvotes: 2
Reputation: 11849
As spraff said, the answer to any question of the form "will X be fast enough?" is: "it depends."
I would be concerned about:
I don't know what you're doing, but it's possible that it could be accomplished with a custom score algorithm.
Of course, just because it will be slower to search all documents, this doesn't mean it will be too slow to be useful. Some faceting implementations do essentially get all matching documents, and these perform adequately for many people.
Upvotes: 2