Reputation: 1221
Have some api resources that are under heavy load, where responses are dynamic and to offload the origin servers we are using Varnish as a caching layer in front. The api responds with cache-control headers ranging from max-age=5 to max-age=15. Since we are using a low cache ttl a lot of requests still end up in a backend fetch. In that sense we are not sure we understand varnish request coalescing correctly with regards to grace. We have not touched any grace settings, using grace from VCL og sending stale-while-revalidate headers from the backend.
So question is; After a resource expires from the cache, all request for that resource will wait in varnish until the resource is fresh in the cache again, to prevent the thundering herd problem? Or will the default grace settings prevent “waiting” requests as they will be served “stale” content while the backend fetch completes? From the docs is not clear to us how the defaults work.
Upvotes: 4
Views: 1861
Reputation: 4808
The total lifetime of an object is the sum of the following items:
TTL + grace + keep
Let's break this down:
Here's the order of execution:
The waiting list in Varnish that is used for request coalescing is only used for non-cached objects or expired objects that are passed their grace time.
The following scenarios will not trigger request coalescing:
TTL > 0
TTL + grace > 0
When the object is fresh or within grace, there is no need to use the waiting list, because the content will still be served from cache. In the case of objects within grace, a single asynchronous backend request will be sent to the origin for revalidation.
When an object is not in cache or out of grace, a synchronous revalidation is required, which is a blocking action. To avoid that this becomes problematic when multiple clients are requesting the same object, a waiting list is used and these requests are coalesced into a single backend request.
In the end, all the queued requests are satisfied in parallel by the same backend response.
But here's an important remark about request coalescing:
Request coalescing only works for cacheable content. Stateful content that can never be satisfied by a coalesced response should bypass the waiting list. If not, serialization will take place.
Serialization is a bad thing. It means that queued requests cannot be satisfied by the response, and are handled serially. This head-of-line blocking can cause significant delays.
That's why stateless/uncacheable content should bypass the waiting list.
The decision to bypass the waiting list is made by the hit-for-miss cache. This mechanism caches the decision not to cache.
The following code is used for that:
set beresp.ttl = 120s;
set beresp.uncacheable = true;
It's the kind of VCL code you'll find in the built-in VCL of Varnish. It is triggered when a Set-Cookie
header is found, or when Cache-Control: private, no-cache, no-store
occurs.
This implies that for the next 2 minutes the object will be served from the origin, and the waiting list will be bypassed. When the next cache miss would return a cacheable response, the object is still stored in cache, and hit-for-miss no longer applies.
With that in mind it is crucial to not set beresp.ttl
to zero. Because that would expire hit-for-miss information, and would still result in the next request ending up on the waiting list, even though we know the response will not be cacheable.
Upvotes: 13