Jeroen
Jeroen

Reputation: 459

Optimize SOLR for retrieving all search results

Sometimes I don't need just the top X results from a SOLR query, but all results (running into millions). This is easily achievable by searching once with 0 rows as a request parameter, and then re-execute the search with the numFound from the result as number of rows(*) Of course we can sort the results by e.g. "id asc" to remove relevancy ranking, however, I would like to be able to disable the entire scoring calculation for these queries, as they probably are quite computational intensive and we just don't need them in these cases.

My question: Is there a way to make SOLR work in boolean mode and effectively run faster on these often slow queries, when all we need is just all results?

(*) I actually usually simply do a paged query where a script walks through the pages (multi threaded), to prevent timeouts on large result sets, yet keep it fast as possible, but this is not important for the question.

This looks like a related question, but apparently the user asked the wrong question and was only after retrieving all results: Solr remove ranking or modify ranking feature; This question is not answered there.

Upvotes: 0

Views: 198

Answers (2)

Alexandre Rafalovitch
Alexandre Rafalovitch

Reputation: 9789

There is a couple of things to be aware of

  • Solr deep paging allows you to export large number of results much quicker
  • Using an export format such as CSV could be faster than using an XML format just due to the formatting and it being more compact
  • And, as already mentioned, if you are exporting all, put your queries into FilterQuery with caching off
  • For very complex queries, if you can split it into several steps, you can actually assign different weights to the filters and have them execute in sequence. This allows to use cheap first filter that gets rid of most of the results and only then apply more expensive, more precise, filters

Upvotes: 1

sisve
sisve

Reputation: 19781

Use filters instead of queries; there is no score calculation for filters.

Upvotes: 1

Related Questions