dcg
dcg

Reputation: 1271

Increase Solr search concurrency

Short story: I am not able to run more than 2 simultaneous searches on solr5 (same story with 4.10) from the same client process. Is there any flag in configuration file I missed? It's a proven fact it's not a hardware problem or software (client) problem. See below for the full story.

Long story:

I need to build a word-based search engine (fields contain in general only one word/value - even if it is a multi-value field, all values will be only one word) and 60-70% of the searches are without wildcards. The expected core size is around 50K documents with an average of 20 fields. The collection is expected to be updated around one time per week (probably even less) - so I don't really care about indexing time. I guess we can safely assume there will be no write, just read - therefore, we can minimize probability of locks and other concurrency issues. Also, the most "expensive" query in my test is (as per solr's qtime) around 150. I have a batch of 10K radomly generated searches and no matter what I am doing, I am not able to finish them in less than 5 minutes. No matter how many threads I am opening on client side, no matter what value I set in configuration files ... and the processor is around 30-40% tops, with only 30% memory;

What I have tried:

  1. solr5 + jetty on a single-core virtual machine with 3GB RAM;
  2. solr5 + jetty on a dual-core virtual machine with 6GB RAM (4GB for java);
  3. solr5 + tomcat6 on a dual-core virtual machine with 6GB RAM;

using netstat -a -n | grep @port for #1 and #2 I only saw 2 active connections (ESTABLISHED) at any given time - but no more, and for #3 I had beside those 2 active connections other 10-15 in TIME_WAIT mode (not active).

I am somehow lost in this ... I am not a Java ninja and I am not savy with the java-related products and their configuration. I used 2 different servlet-containers with pretty much the same problem. IMO, it's obvious that someone throttles the active connections - and I don't know what to do to find out who and why.

As a side note - I am not sure if it is important or not - I copied the same tool on another machine, started the "stress" test at the same time with the one on my machine and I noticed that the number of active connections is doubled (via netstat), the resources are only a little bit higher than in single-machine-test and the execution time is identic of both machines: 5 minutes.

So, what should I do to remove this limit - or at least to increase it?

Upvotes: 1

Views: 643

Answers (1)

dcg
dcg

Reputation: 1271

As usual, the problem lies between the chair and the keyboard. :(

The client was done in C# using the plain old WebRequest class - which obeys to system limit of concurrent HTTP calls made to the same address (to avoid DOS).

After reading this article, I realized where the problem was. So, the following tweak in app.config solved the issue:

<system.net>
    <connectionManagement>
        <add address = "*" maxconnection = "300" />
    </connectionManagement>
</system.net>

It finished all those requests in around one minute with 16 opened threads. Active connections were also visible in netstats.

Upvotes: 1

Related Questions