Preventing solr cache flush when commiting

Question

My application has low write throughput and I can manage 2-3 minutes for changes to reflect in solr search results.
Currently I do commits via my indexing application (after every batch of documents) and also have the following configured on solr side:

solr.autoSoftCommit.maxTime : -1 (disabling auto soft commit)
solr.autoCommit.maxTime : 300000 (5 mins of hard auto commit interval)
opensearcher : false

The reasons for choosing the configuration comes from my understanding of the following:

My application being read heavy needs high amount of caching and I can't afford to get my cached flushed. Thus, I've disabled the soft commits altogether.
I've disabled opensearcher as again if I won't do it it'll invalidate the top level caches which isn't desirable

In production, I've observed that as soon as my application tries to index even 1 document (or a batch) and then issue a commit statement (from my application) all my top level caches gets expunged.
I thought maybe just relying on hard auto commit will help, but according to this stack overflow link

Hard commits are about durability, soft commits are about visibility. There are really two flavors here, openSearcher=true and openSearcher=false. First we’ll talk about what happens in both cases. If openSearcher=true or openSearcher=false, the following consequences are most important:

The tlog is truncated: A new tlog is started. Old tlogs will be deleted if there are more than 100 documents in newer, closed tlogs. The current index segment is closed and flushed. Background segment merges may be initiated. The above happens on all hard commits. That leaves the openSearcher setting

openSearcher=true: The Solr/Lucene searchers are re-opened and all caches are invalidated. Autowarming is done etc. This used to be the only way you could see newly-added documents.

openSearcher=false: Nothing further happens other than the four points above. To search the docs, a soft commit is necessary.

So to sum it up a soft commit will flush caches and so will an auto hard commit with opensearcher=true. While auto hard commit with opensearcher=false will not allow the changes I added to be reflected.

Please do point me out if I've misunderstood anything.

Now here are my questions :

Is there no way to ensure that the top level filter caches are not expunged when some documents are added to the index and have the changes available at the same time?
If that is the case, then do I need to always have to rely on warmup of caches to get some documents in caches?
Are there any other approaches than warmup which folks usually do to avoid this; if they want to build a fast searchable product and having some write throughput as well?

I've read several documentation links and articles but I couldn't find any proper one explaining what settings to be used in different scenarios. It'll be really helpful if someone can explain what I'm doing wrong and guide me to a proper solution.

root · Accepted Answer

your understanding is right.

Solr caches are associated with a specific instance of an Index Searcher, a specific view of an index that doesn’t change during the lifetime of that searcher. As long as that Index Searcher is being used, any items in its cache will be valid and available for reuse.

When a new searcher is opened, the current searcher continues servicing requests while the new one auto-warms its cache. The new searcher uses the current searcher’s cache to pre-populate its own. When the new searcher is ready, it is registered as the current searcher and begins handling all new search requests. The old searcher will be closed once it has finished servicing all its requests.

If you need to have your searcher access to newly added docs, you need to open a new searcher. which can either done by using soft commits or hard commit with openSearcher=true. the downside is that the your top-level caches will be invalidated. That's the price you pay for getting visibility.
Yes warmup is the best way to get your caches populated before openeing a new searcher. You should identify what are the most common used queries in your system and have those autowarm the new caches.
If you do not want real time search and can tolerate this, you should turn off soft commit and use hard commits with opensearcher=true. The interval of hard commits depeneds how much latency can your application tolerate. If you dont care that a document indexed at t=t1 appears untill t=t1+x minutes. you should commit every x minutes.

Every option comes with a downside . You need to figure out what works best for you.

There is no free lunch.

Preventing solr cache flush when commiting

Answers (2)

Related Questions