Reputation: 3709
I deployed an instance of Solr onto a ubuntu machine with tomcat. Then i have a single thread client program to read and inject data into Solr. I am observing memory and cpu usages, and realized that I still have a lot of resources (in terms of memory and CPUs) to use. I wonder if I should change my indexing code to multi-threading to inject into Solr? To index 20 millions of data using current single thread program, it needs about 14 hours. This is why i wonder if i should change to use multi-threading as well. Thanks in advance for your suggestions and help! :)
Upvotes: 4
Views: 6845
Reputation: 4284
Multi-threading while indexing in Solr is widely used. What you say is not very clear if you can also multi-thread the reading from your source, but I think that is the way to go. I suggest you try it, but first try to analize your code and see which part of the code is the slowest and include that in the multi-threading.
Also keep an eye on your commit strategy.
From the Solr documentation: (http://wiki.apache.org/solr/SolrPerformanceFactors) "In general, adding many documents per update request is faster than one per update request. ... Reducing the frequency of automatic commits or disabling them entirely may speed indexing. Beware that this can lead to increased memory usage, which can cause performance issues of its own, such as excessive swapping or garbage collection."
Upvotes: 4