Reputation: 11
I'm trying to set up multiple solr cores (the data for each core is indexed using norconex, crawling entirely separate sites). The schema and solrconfig files are the same for all cores but there is a copy in each of their respective conf folders.
When I run a query in the admin UI for core 1, I'm getting a mix of results from info indexed to cores 2 and 3 as well. How do I keep them entirely separate? It was my understanding that having separate cores would do this by default?
I've tried clearing all documents from cores 2 and 3, but core 1 still pulls up their docs. Thanks for any help anyone can provide.
Upvotes: 1
Views: 88
Reputation: 98
The issue you're describing above sounds like it could be that you have cores 1 through 3 on the same shard. That means that they would be replicas of each other and have the same data. If core1 were to be killed and replaced with another core, then data from the other cores would be replicated to the new core when the new core was added to the collection.
If you want subsets of documents in three separate cores (the physical locations), then those cores need to live in three separate shards (the logical locations). This can be accomplished using routing.
The compositeId router will let you send documents or queries to specific shards. The documentation shows an example of using data from a company field as part of the routing key value like this: "IBM!12345"
The exclamation point is a separator to break the key into the various parts used for creating the shard hash value. This allows sending "IBM" data to one shard, and "YOYODYNE" can be sent to another shard.
If "YOYODYNE" had way more documents than "IBM", then you might want to spread documents for "YOYODYNE" across multiple shards. The documentation says to use something like this:
Another use case could be if the customer "IBM" has a lot of documents and you want to spread it across multiple shards. The syntax for such a use case would be: shard_key/num!document_id where the /num is the number of bits from the shard key to use in the composite hash.
So IBM/3!12345 will take 3 bits from the shard key and 29 bits from the unique doc id, spreading the tenant over 1/8th of the shards in the collection. Likewise if the num value was 2 it would spread the documents across 1/4th the number of shards. At query time, you include the prefix(es) along with the number of bits into your query with the route parameter (i.e., q=solr&route=IBM/3!) to direct queries to specific shards.
Upvotes: 0
Reputation: 9789
This should not be happening. So, something has gone wrong. Possible options, from most likely and down:
You did not say what happens when you query core2. If it does not have any documents, then first outcome is most likely. If it does, there may be other issues in play.
Upvotes: 1