user1097437
user1097437

Reputation: 121

How to index and query data which is not available at index time using solr

In one one of my application we need to index huge data (30GB). We are using SOLR to index this data. we have 50 field in our schema.xml. I am indexing data from different databases.

But at the time of indexing all the fields data is not available. So we have created multiple cores and indexing each core separately.

Ex:- using Core 0 we are indexing 5 fields using separate query

Select Field1 ,Field2 ,Field3 ,Field4 ,Field5 from dual.

Field1 --- Common field across cores Field2 --- Field which is indexed in this core Field3 -- Field which is indexed in this core Field4 -- Field which is indexed in this core Field5 -- Field which is indexed in this core

So rest all field in core0 will be null other than the above 5 fields.

Next for core 1

Core 1 we are indexing 3 fields using separate query

Select Field1 ,Field6 ,Field6 from dual.

Field1 --- Common field across cores Field6 --- Field which is indexed in this core Field7 -- Field which is indexed in this core

We are using common schema.xml for all cores.

For querying we wrote a custom request handler which queries each core separately and then merge the results. Also the data in each core will get refreshed every 3 hours. I have tried partial update feature in solr4.0, but it too takes much time to index...... not so helpful

Is their any better approach/design to handler this problem?

Thanks, ravi

Upvotes: 0

Views: 120

Answers (1)

Ram G
Ram G

Reputation: 4829

You can use shards to query across multiple cores. You can do this from any code.

e.g.;

solr/core1/select/?q=iPad&shards=localhost:8983/solr/core1,localhost:8983/solr/core0

You can pass the as many cores as you want with "," in shards.

Upvotes: 0

Related Questions