SolrCloud: workaround for classic pagination with "start,rows" parameters

Question

I have SolrCloud with 3 shards.

My purpose: select and process all products from category.

Current implementation: Portion selection in cycle.

1st iteration: q=cat:1&start=0&rows=100
2nd iteration: q=cat:1&start=100&rows=100
3th: q=cat:1&start=200&rows=100

...

But growing "start", performance is down. Explanation here: https://wiki.apache.org/solr/DistributedSearch

Makes it more inefficient to use a high "start" parameter. For example, if you request start=500000&rows=25 on an index with 500,000+ docs per shard, this will currently result in 500,000 records getting sent over the network from the shard to the coordinating Solr instance. If you had a single-shard index, in contrast, only 25 records would ever get sent over the network. (Granted, setting start this high is not something many people need to do.)

What ideas how I can walk around all records in category?

MatsLindh · Accepted Answer

There is another way to do more effective pagination in Solr - Cursors - which uses the current place in the sort instead. This is particularly useful for deep pagination.

See the section about Cursors at the Pagination of Results wiki page. This should speed up delivery as the Server should be able to do a sort of its local documents, decide where it is in that sequence and return 25 documents after that document.

UPDATE: Also useful link coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets

SolrCloud: workaround for classic pagination with "start,rows" parameters

Answers (2)

Related Questions

SolrCloud: workaround for classic pagination with &quot;start,rows&quot; parameters

Answers (2)

Related Questions

SolrCloud: workaround for classic pagination with "start,rows" parameters