Optimizing Solr 4 on EC2 debian instance(s)

Question

My Solr 4 instance is slow and I don't know why. I am attempting to modify the configurations of JVM, Tomcat6 and Solr 4 in order to optimize performance, with queries per second as the key metric. Currently I am running on an EC2 small tier with Debian squeeze, but ready to switch to Ubuntu if needed.

There is nothing special about my use case. The index is small. Queries do include a moderate number of unions (e.g. 10), plus faceting, but I don't think that's unusual.

My understanding is that these areas could need tweaking:

Configuring the JVM Garbage collection schedule and memory allocation ("GC tuning is a precise art form", ref)
Other JVM settings
Solr's Query Result cache, Filter cache, Document cache settings
Solr's Auto-warming settings

There are a number of ways to monitor the performance of Solr:

But none of these methods indicate which settings need to be adjusted, and there's no guide that I know of that steps through an exhaustive list of settings that could possibly improve performance. I've reviewed the following pages (one, two, three, four), and gone through some rounds of trial and error so far without improvement.

Questions:

How to tell JVM to use all the 2 GB memory on the small EC2 instance?
How to debug and optimize JVM Garbage Collection?
How do I know when I/O throttling, such as the new EBS IOPS pricing, is the issue?
Using figures like the NewRelic examples below, how to detect what is problematic behavior, and how to approach solutions.

Answers:

I'm looking for link to good documentation for setting up and optimizing Solr 4, from a DevOps or server admin perspective (not index or application design).
I'm looking for the top trouble spots in catalina.sh, solrconfig.xml, solr.xml (other?) that are most likely causes of problems.
Or any tips you think address the questions.

enter image description here

Pierre Laporte · Accepted Answer

First, you should not focus on switching your linux distribution. A different distribution might bring some changes but considering the information you gave, nothing prove that these changes may be significant.

You are mentionning lots of possibilities for your optimisations, this can be overwhelming. You should consider an tweaking area only once you have proven that the problem lies in that particular part of your stack.

JVM Heap Sizing

You can use the parameter -mx1700m to give a maximum of 1.7GB of RAM to the JVM. Hotspot might not need it, so don't be surprised if your heap capacity does not reach that number.

You should set the minimum heap size to a low value, so that Hotspot can optimise its memory usage. For instance, to set a minimal heap size at 128MB, use -mx128m.

Garbage Collector

From what you say, you have limited hardware (1-core at 1.2GHz max, see this page)

M1 Small Instance

1.7 GiB memory

1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)

...

One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor

Therefore, using that low-latency GC (CMS) won't do any good. It won't be able to run concurrently with your application since you have only one core. You should switch to the Throughput GC using -XX:+UseParallelGC -XX:+UseParallelOldGC.

Is the GC really a problem ?

To answer that question, you need to turn on GC logging. It is the only way to see whether GC pauses are responsible for your application response time. You should turn these on with -Xloggc:gc.log -XX:+PrintGCDetails.

But I don't think the problem lies here.

Is it a hardware problem ?

To answer this question, you need to monitor resource utilization (disk I/O, network I/O, memory usage, CPU usage). You have a lot of tools to do that, including top, free, vmstat, iostat, mpstat, ifstat, ...

If you find that some of these resources are saturating, then you need a bigger EC2 instance.

Is it a software problem ?

In your stats, the document cache hit rate and the filter cache hit rate are healthy. However, I think the query result cache hit rate is pretty low. This implies a lot of queries operations.

You should monitor the query execution time. Depending on that value you may want to increase the cache size or tune the queries so that they take less time.

More links

JVM options reference : http://jvm-options.tech.xebia.fr/
A feedback that I did on some application performance audit : http://www.pingtimeout.fr/2013/03/petclinic-performance-tuning-about.html

Hope that helps !

Optimizing Solr 4 on EC2 debian instance(s)

Answers (1)

Related Questions