Solr Caching Update on Writes

Question

I've been looking at potential ways to speed up solr queries for an application I'm working on. I've read about solr caching (https://wiki.apache.org/solr/SolrCaching), and I think the filter and query caches may be of some help. The application's config does setup these caches, but it looks like with some default settings that weren't experimented with, and our cache hit rate is relatively low.

One detail I've not been able to determine is how the caches deal with updates. If I update records that would result in removing or adding that record from the query or filter cache, do the caches update in a performant way? The application is fairly write-heavy, so whether the caches update in a conducive manner or not will probably determine whether trying to tune the caches will help much.

Patrick Lee · Accepted Answer

The short answer is that an update (add, edit, or delete) on your index followed by a commit operation rebuilds the index and replaces the current index. Since caches are associated with a specific index version, they are discarded when the index is replaced. If autowarming is enabled, then the caches in the new index will be primed with recent queries or queries that you specify.

However, this is Solr that we're talking about and there are usually multiple ways to handle any situation. That is definitely the case here. The commit operation mentioned above is known as a hard commit and may or may not be happening depending on your Solr configuration and how your applications interact with it. There's another option known as a soft commit that I believe would be a good choice for your index. Here's the difference...

A hard commit means that the index is rebuilt and then persisted to disk. This ensures that changes are not lost, but is an expensive operation.

A soft commit means that the index is updated in memory and not persisted to disk. This is a far less expensive operation, but data could conceivably be lost if Solr is halted unexpectedly.

Going a step further, Solr has two nifty settings known as autoCommit and autoSoftCommit which I highly recommend. You should disable all hard commit operations in your application code if you enable auto commit. The autoCommit setting can specify a period of time to queue up document changes (maxTime) and/or the number of changes to allow in the queue (maxDocs). When either of these limits is reached, a hard commit is performed. The autoSoftCommit setting works the same way, but results in (you guessed it) a soft commit. Solr's documentation on UpdateHandlers is a good starting point to learn about this.

These settings effectively make it possible to do batch updates instead of one at a time. In a write-heavy application such as yours, this is definitely a good idea. The optimal settings will depend upon the frequency of reads vs writes and, of course, the business requirements of the application. If near-real-time (NRT) search is a requirement, you may want autoSoftCommit set to a few seconds. If it's acceptable for search results to be a bit stale, then you should consider setting autoSoftCommit to a minute or even a few minutes. The autoCommit setting is usually set much higher as its primary function is data integrity and persistence.

I recommend a lot of testing in a non-production environment to decide upon reasonable caching and commit settings for your application. Given that your application is write-heavy, I would lean toward conservative cache settings and you may want to disable autowarming completely. You should also monitor cache statistics in production and reduce the size of caches with low hit rates. And, of course, keep in mind that your optimal settings will be a moving target, so you should review them periodically and make adjustments when needed.

On a related note, the Seven Deadly Sins of Solr is a great read and relevant to the topic at hand. Best of luck and have fun with Solr!

Solr Caching Update on Writes

Answers (1)

Related Questions