Reputation: 121
In one one of my application we need to index huge data (30GB). We are using SOLR to index this data. we have 50 field in our schema.xml. I am indexing data from different databases.
But at the time of indexing all the fields data is not available. So we have created multiple cores and indexing each core separately.
Ex:- using Core 0 we are indexing 5 fields using separate query
Select Field1 ,Field2 ,Field3 ,Field4 ,Field5 from dual.
Field1 --- Common field across cores Field2 --- Field which is indexed in this core Field3 -- Field which is indexed in this core Field4 -- Field which is indexed in this core Field5 -- Field which is indexed in this core
So rest all field in core0 will be null other than the above 5 fields.
Next for core 1
Core 1 we are indexing 3 fields using separate query
Select Field1 ,Field6 ,Field6 from dual.
Field1 --- Common field across cores Field6 --- Field which is indexed in this core Field7 -- Field which is indexed in this core
We are using common schema.xml for all cores.
For querying we wrote a custom request handler which queries each core separately and then merge the results. Also the data in each core will get refreshed every 3 hours. I have tried partial update feature in solr4.0, but it too takes much time to index...... not so helpful
Is their any better approach/design to handler this problem?
Thanks, ravi
Upvotes: 0
Views: 120
Reputation: 4829
You can use shards
to query across multiple cores. You can do this from any code.
e.g.;
solr/core1/select/?q=iPad&shards=localhost:8983/solr/core1,localhost:8983/solr/core0
You can pass the as many cores as you want with "," in shards
.
Upvotes: 0