Varnish default grace behavior

Question

Have some api resources that are under heavy load, where responses are dynamic and to offload the origin servers we are using Varnish as a caching layer in front. The api responds with cache-control headers ranging from max-age=5 to max-age=15. Since we are using a low cache ttl a lot of requests still end up in a backend fetch. In that sense we are not sure we understand varnish request coalescing correctly with regards to grace. We have not touched any grace settings, using grace from VCL og sending stale-while-revalidate headers from the backend.

So question is; After a resource expires from the cache, all request for that resource will wait in varnish until the resource is fresh in the cache again, to prevent the thundering herd problem? Or will the default grace settings prevent “waiting” requests as they will be served “stale” content while the backend fetch completes? From the docs is not clear to us how the defaults work.

Thijs Feryn · Accepted Answer

The basics about the lifetime of an object in Varnish

The total lifetime of an object is the sum of the following items:

TTL + grace + keep

Let's break this down:

The TTL defines the freshness of the content
Grace is used for asynchronous revalidation of expired content
Keep is used for synchronous revalidation of expired content

Here's the order of execution:

An object is served from cache as long as the TTL hasn't expired
When the TTL is equal or lower to zero, revalidation is required
As long as the sum of the remaining TTL (possibly below zero) and the grace time is more than zero, stale content can be served
If there is enough grace time, Varnish will asynchronously revalidate content while serving stale content
If both the TTL and the grace have expired, synchronous revalidation is required
Synchronous revalidation uses the waiting list and is subject to request coalescing
The remaining keep time will ensure the object is kept around so that conditional requests can take place

Default values

According to http://varnish-cache.org/docs/trunk/reference/varnishd.html#default-ttl the default TTL is set to 120 seconds
According tohttp://varnish-cache.org/docs/trunk/reference/varnishd.html#default-grace the default grace is set to 10 seconds
According tohttp://varnish-cache.org/docs/trunk/reference/varnishd.html#default-keep the default keep is set to 0 seconds

What about request coalescing?

The waiting list in Varnish that is used for request coalescing is only used for non-cached objects or expired objects that are passed their grace time.

The following scenarios will not trigger request coalescing:

TTL > 0
TTL + grace > 0

When the object is fresh or within grace, there is no need to use the waiting list, because the content will still be served from cache. In the case of objects within grace, a single asynchronous backend request will be sent to the origin for revalidation.

When an object is not in cache or out of grace, a synchronous revalidation is required, which is a blocking action. To avoid that this becomes problematic when multiple clients are requesting the same object, a waiting list is used and these requests are coalesced into a single backend request.

In the end, all the queued requests are satisfied in parallel by the same backend response.

Bypassing the waiting list

But here's an important remark about request coalescing:

Request coalescing only works for cacheable content. Stateful content that can never be satisfied by a coalesced response should bypass the waiting list. If not, serialization will take place.

Serialization is a bad thing. It means that queued requests cannot be satisfied by the response, and are handled serially. This head-of-line blocking can cause significant delays.

That's why stateless/uncacheable content should bypass the waiting list.

The decision to bypass the waiting list is made by the hit-for-miss cache. This mechanism caches the decision not to cache.

The following code is used for that:

set beresp.ttl = 120s;
set beresp.uncacheable = true;

It's the kind of VCL code you'll find in the built-in VCL of Varnish. It is triggered when a Set-Cookie header is found, or when Cache-Control: private, no-cache, no-store occurs.

This implies that for the next 2 minutes the object will be served from the origin, and the waiting list will be bypassed. When the next cache miss would return a cacheable response, the object is still stored in cache, and hit-for-miss no longer applies.

With that in mind it is crucial to not set beresp.ttl to zero. Because that would expire hit-for-miss information, and would still result in the next request ending up on the waiting list, even though we know the response will not be cacheable.

Varnish default grace behavior

Answers (1)

The basics about the lifetime of an object in Varnish

Default values

What about request coalescing?

Bypassing the waiting list

Related Questions