Eduard BABKIN
Eduard BABKIN

Reputation: 113

How to design multiple concurent imports using DIH in SOLR?

There is a case when an external application should send a unknown number of different indexing requests to SOLR. In fact, those requests should be processed by SOLR Data Import Handlers according to the config submitted inside the request.

There is a SOLR constraint - only one indexing request can be processed by a particular DIH. Because the number of the requests can be quite large and they arrive in parallel, it is inpractical to define multiple DIH specifications in the solrconfig.xml.

How that problem can be overcome ?

May be SOLR provides some admin API to create DIH specifications dynamically from a client ?

Upvotes: 1

Views: 281

Answers (1)

MatsLindh
MatsLindh

Reputation: 52902

The best way to do this is to create a layer outside of Solr that handles your import tasks. Using DIH will limit what you can do (as you've discovered), and will be hard to make work properly in parallel across multiple nodes and indexing services (it's designed for a far simpler scenario).

Using a simple queue (Redis, Celery, ApacheMQ, whatever fits your selection of languages and technology) that the external application can put requests into and that your indexing workers pick up tasks from will be scalable and customizable. It'll allow you to build out onto multiple index nodes as the number of tasks grow, and it'll allow you to pull data from multiple sources as necessary (and apply caching if required).

Upvotes: 1

Related Questions