building in support for future Solr sharding

Question

Building an application. Right now we have one Solr server. But we would like to design the app so that it can support multiple Solr shard in future if we outgrow the indexing needs.

What are keys things to keep in mind when developing an application that can support multiple shards in future?

we stored the solr URL /solr/ in a DB. Which is used to execute queries against solr. There is one URL for Updates and one URL for Searches in the DB

If we add shards to the solr environment at a future date, will the process for using the shards be as simple as updating the URLs in the DB? Or are there other things that need to be updated. We are using SolrJ

e.g. change the SolrSearchBaseURL in DB to:

https://solr2/solr/select?shards=solr1/solr,solr2/solr&indent=true&q={search_query}

And updating the SolrUpdateBaseURL in DB to

https://solr2/solr/

?

D_K · Accepted Answer

Basically, what you are describing has already been implemented in SolrCloud. There the ZooKeeper maintains the state of your search cluster (which shards in what collections, shard replicas, leader and slave nodes and more). It can handle the load on indexing and querying sides by using hashing.

You could, in principle, get by (at least in the beginning of your cluster growth) with the system you have developed. But think about replicating, adding load balancers, external cache servers (like e.g. varnish): in the long run you would end up implementing smth like SolrCloud yourself.

Having said that, there are some caveats to using hash based indexing and hence searching. If you want to implement logical partitioning of you data (say, by date) at this point there is no way to this but making a custom code. There is some work projected around this though.

building in support for future Solr sharding

Answers (1)

Related Questions