Alexander Craggs
Alexander Craggs

Reputation: 8819

How do I instruct Varnish to cache based on response header?

I have a series of videos on various URLs across the site. I would like to cache them all with Varnish, even if the user is logged in. I can use the VCL configuration in order to whitelist certain URLs for caching. But I don't know how I can whitelist all videos.

Is there a way to say that all responses that return with a content type of video/mp4 are cached?

Upvotes: 0

Views: 2790

Answers (1)

Thijs Feryn
Thijs Feryn

Reputation: 4828

Deciding to serve an object from cache, and deciding to store an object in cache are 2 different things in Varnish. Both situations need to be accounted for.

Built-in VCL

In order to understand what happens out-of-the-box, you need to have a look at the following VCL file: https://github.com/varnishcache/varnish-cache/blob/master/bin/varnishd/builtin.vcl

This is the built-in VCL that is executed. For each subroutine this logic is executed when you don't do an explicit return(xyz) in your VCL file for the corresponding subroutine.

This means you have some sort of safety net to protect you.

From a technical perspective, the Varnish Compiler will add the built-in VCL parts to the subroutines you extend in your VCL prior to compiling the VCL file into a C code.

What do we learn from the built-in VCL

The built-in VCL teaches us the following things when it comes to cacheability:

  • Varnish will only serve an object from cache for GET and HEAD requests (see vcl_recv)
  • Varnish will not serve an object from cache if a Cookie or Authorization header is present (see vcl_recv)
  • Varnish will not store an object in cache if a Set-Cookie header is present (see vcl_backend_response)
  • Varnish will not store an object in cache if TTL is zero or less (see vcl_backend_response)
  • Varnish will not store an object in cache if the Cache-Control header contains no-store(see vcl_backend_response)
  • Varnish will not store an object in cache if the Surrogate-Control header contains no-cache, no-store or private (see vcl_backend_response)
  • Varnish will not store an object in cache if the Vary header performs cache variations on all headers via a * (see vcl_backend_response)

How to make sure video files are served from cache

In vcl_recv you have to make sure Varnish is willing to lookup video requests from cache. In practical terms, this means taking care of the cookies.

My advice would be to remove all cookies, except the ones you really need. The example below will remove all cookies, except the PHPSESSID cookie, which is required by my backend:

vcl 4.1;

sub vcl_recv {
    if (req.http.Cookie) {
    set req.http.Cookie = ";" + req.http.Cookie;
    set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
    set req.http.Cookie = regsuball(req.http.Cookie, ";(PHPSESSID)=", "; \1=");
    set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");
    
    if (req.http.cookie ~ "^\s*$") {
        unset req.http.cookie;
    }
}

This example will remove tracking cookies from the request, which is fine, because they are processed by Javascript.

When the PHPSESSID cookie is not set, vcl_recv will fall back on the built-in VCL, and the request will be served from cache.

But in your case, you want them to be served from cache, even when users are logged in. This is fine, because videos are static files that aren't influenced by state.

The problem is that in the request context you cannot specify Content-Type information. You'll have to use the URL.

Here's example:

sub vcl_recv {
    if(req.url ~ "^/video") {
        return(hash);
    }
}

This snippet will bypass the built-in VCL, and will explicitly look the object up in cache, if the URL matches the ^/video regex pattern.

How to make sure video files are stored in cache

When you do an explicit return(hash) in vcl_recv, the hash will be created and a cache lookup takes place. But if the object is not stored in cache, you'll still have a miss, which results in a backend request.

When the backend response comes back, it needs to be stored in cache for a certain amount of time. Given the built-in VCL, you have to make sure you don't specify a zero-TTL, and the Cache-Control response header must return cacheable syntax.

This is how I would set the Cache-Control header if for example we want to cache video files for a day:

Cache-Control: public, max-age=86400

Varnish will respect this header, and will set the TTL to 1 day based on the max-age syntax.

Even if you don't specify a Cache-Control header, Varnish will still store it in cache, but for 2 minutes, which is the default TTL.

Here's an example where Varnish will not store the object in cache, based on the Cache-Control header:

Cache-Control: private, max-age=0, s-maxage=0 ,no-cache, no-store

If either of these expressions is in Cache-Control, Varnish will make the object uncacheable.

If Set-Cookie headers are part of the response, the object becomes uncacheable as well.

In case you don't have full control over the headers that are returned by the backend server, you can still force your luck in VCL.

Here's a VCL snippet where we force objects to be stored in cache for images and videos:

sub vcl_backend_response {
    if(beresp.http.Content-Type ~ "^(image|video)/") {
        set beresp.ttl = 1d;
        unset beresp.http.set-cookie;
        return (deliver);
    }
}

This example will strip off Set-Cookie headers, will override the TTL to a day, and it will explicitly store and deliver the object. This is only the case when the Content-Type response headers either starts with image/, or with video/

Upvotes: 2

Related Questions