gaoagong
gaoagong

Reputation: 1255

Solr Collection Creation and Custom Collection Libraries

I've run into a situation where we sometimes have to completely wipe an index and then re-index a collection. This process of course takes a lot of time. I don't want to allow for any or at least extended downtime in Prod. Thus, I am researching into a way in Solr to create a new Collection that is a copy of an old collection, but without the data. I can re-index this new Collection with little or any service degradation. I then want to use aliases to point the new Collection to the alias that our clients are using so that they will start using the new Collection without even knowing it.

I'm currently running 4.2, but wondering if I shouldn't upgrade to 4.7 in order to support this better. Seems like 4.2 has most of the same Collection API support.

One of the first snags I'm hitting is that the Collection I am copying has a lib folder in it with customer libraries. If possible I want to push these out to the solrhome/lib folder so that they only get loaded once. My problem with this is that if I have different versions of say a custom data importer, then I will run into classloader issues.

Has anyone successfully implemented this kind of scenario and could provide some insight into the pitfalls and successes that you had and what worked for you?

More Details... I have many different collections that are part of this Solr cloud. I don't want to effect any of the other collections if possible while making changes to the new copied collection.

Upvotes: 0

Views: 226

Answers (1)

buddy86
buddy86

Reputation: 1454

I'm also having a similar kind of situation where I might to modify solr schema and need to re index the whole data. But I don't have much downtime in production. So, we came up with a solution like..

Let's say I have a SolrCloud1 (existing one), with collection1 (It has it's own structure). I have my application running in different machine. There is a load-balancer in between my SolrCloud1 and application.

Now, create a separate SolrCloud (say, SolrCloud2) with collection1. Maintain the same structure as it was previously. Now, do the re indexing part in this SolrCloud2. When it's done, make available the new SolrCloud under the load-balancer. When the new SolrCLoud2 is up, shut down the SolrCloud1.

Thus without any production down time you'll re index the data. Users won't be able to know anything about this. Hope this will help.

Upvotes: 1

Related Questions