supersoft
supersoft

Reputation: 63

Improving Solr performance

I have deployed a 5-sharded infrastructure where: shard1 has 3124422 docs shard2 has 920414 docs shard3 has 602772 docs shard4 has 2083492 docs shard5 has 11915639 docs Indexes total size: 100GB

The OS is Linux x86_64 (Fedora release 8) with vMem equal to 7872420 and I run the server using Jetty (from Solr example download) with: java -Xmx3024M -Dsolr.solr.home=multicore -jar start.jar

The response time for a query is around 2-3 seconds. Nevertheless, if I execute several queries at the same time the performance goes down inmediately: 1 simultaneous query: 2516ms 2 simultaneous queries: 4250,4469 ms 3 simultaneous queries: 5781, 6219, 6219 ms 4 simultaneous queries: 6484, 7203, 7719, 7781 ms...

Using JConsole for monitoring the server java proccess I checked that Heap Memory and the CPU Usages don't reach the upper limits so the server shouldn't perform as overloaded. Can anyone give me an approach of how I should tune the instance for not being so hardly dependent of the number of simultaneous queries?

Thanks in advance

Upvotes: 1

Views: 3100

Answers (2)

Toke Eskildsen
Toke Eskildsen

Reputation: 21

As I stated on the Solr mailinglist, where you asked same question 3 days ago, Solr/Lucene benefits tremendously from SSD's. While sharding on more machines or adding bootloads of RAM will work for I/O, the SSD option is comparatively cheap and extremely easy.

Buy an Intel X25 G2 ($409 at NewEgg for 160GB) or one of the new SandForce based SSD's. Put your existing 100GB of indexes on it and see what happens. That's half a days work, tops. If it bombs, scavenge the drive for your workstation. You'll be very happy with the performance boost it gives you.

Upvotes: 2

asm
asm

Reputation: 354

You may want to consider creating slaves for each shard so that you can support more reads (See http://wiki.apache.org/solr/SolrReplication), however, the performance you're getting isn't very reasonable.

With the response times you're seeing, it feels like your disk must be the bottle neck. It might be cheaper for you just to load up each shard with enough memory to hold the full index (20GB each?). You could look at disk access using the 'sar' utility from the sysstat package. If you're consistently getting over 30% disk utilization on any platter while searches are ongoing, thats a good sign that you need to add some memory and let the OS cache the index.

Has it been awhile since you've run an optimize? Perhaps part of the long lookup times is a result of a heavily fragmented index spread all over the platter.

Upvotes: 2

Related Questions