Prasanna Josium
Prasanna Josium

Reputation: 73

Dynamic Solr core selection based on query parameter

I use multiple Solr cores to index products from different segments. e.g. one index contains products from pharma(core1)and another contains products from groceries(core2) and 3rd from electronics goods(core3),.. actually I have cores also for categories and brands of products within those segments.

The problem I have to solve is: When users searches for something, they shall not to be aware of these cores. Based on a query parameter the user sends; I want to figure out the right core/cores(s) to search. e.g., If the user query had ../select?q=apple&seg=0x3e, based on parameter: seg=0x3e the core2 & core2 could be searched but not core3, for a different seg=value, a different set of cores shall be searched

I could make this work with shards search and also configured shards in the solrconfig.xml. a good lead was provided here. but this approach seems to be too static and I cannot influence limiting or selection of shards based on the query parameter.

Is there a Solr way of doing this? like a custom SolrDispathcFilter?

Thanks

Upvotes: 1

Views: 106

Answers (1)

Hugo Zaragoza
Hugo Zaragoza

Reputation: 601

Sorry for this "negative" answer :), but I have a lot of experience with search and SOLR and really I would not do what you suggest.

Partitioning an index by "topics" is not usually a good idea, you have to do an awful lot of manual juggling and it is not maintainable as the number of categories, docs and/or cores grow. Eventually you will set up a cluster (i.e. shards as you point out) and keep adding cores and docs indiscriminately to it. You can always tag your docs by collection or topic for maintenance reasons, but mapping these to cores is hard to maintain.

Given that your index is already partitioned, you can treat it as a distributed index and just hit every core with every query using the SOLR query shards parameter. I would not bother writing a dispatcher, because simple search queries that return zero results are very very very fast and cheap. So hitting your "other" cores with "useless" queries is not much of a problem. For this reason it is probably worth the effort to build a specialized dispatcher, which again is hard to maintain manually.

There is a special case where a dispatcher would be worth it: if your query is very complex (filtering on different fields) and SOLR takes time to figure out that there are not results to return, then it is worth doing what you propose. I dont know of a way to do this in SOLR configuration, you'd need to write your own query handler.

Upvotes: 1

Related Questions