Abhishek Surve
Abhishek Surve

Reputation: 17

Cache same synthetic response for multiple requests with similar cache key at once

Trying to tackle a particular usecase with respect to caching mutliple requests at once

Example:

Request 1

(Note: At the moment the logic choosing the cache key is fixed, i.e. it considers the resource path as well ad all input query parameters)

Request 2

Goals

Note

Thanks

Upvotes: 0

Views: 325

Answers (1)

Thijs Feryn
Thijs Feryn

Reputation: 4808

The assumption I'm going to make is that a non-existent user should always be considered "not found", regardless of the query string parameters used. The output will always be an empty JSON object.

As soon as the user is found, query string parameters matter again.

I have a solution for that marks non-existent users and strips off query string parameters as long as the result is an HTTP 404 response code.

Here's the code, it uses vmod_var, which is part of https://github.com/varnish/varnish-modules/ and needs to be compiled from source. But since you mentioned xkey, I'm going to assume you already compiled these modules:

vcl 4.0;

import var;
import std;

backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

sub vcl_recv {
    set req.url = std.querysort(req.url);
    if(req.url ~ "^/user/([0-9]+)/" && 
        var.global_get("User:" + regsub(req.url,"^/user/([0-9]+)/.*$","\1")) == "notfound") {
        set req.url = regsub(req.url,"\?.*$","");
    }
}

sub vcl_backend_response {
    if(bereq.url ~ "^/user/([0-9]+)/") {
        std.log("User: " + regsub(bereq.url,"^/user/([0-9]+)/.*$","\1"));
        var.global_set("User:" + regsub(bereq.url,"^/user/([0-9]+)/.*$","\1"), "found");
        if(beresp.status == 404) {
            var.global_set("User:" + regsub(bereq.url,"^/user/([0-9]+)/.*$","\1"), "notfound");
        }
    }
}

vcl_backend_response logic

In vcl_backend_response, the VCL subroutine that is called prior to cache insertion, we'll look for responses that originate from a URL that matches the ^/user/([0-9]+)/ regex pattern.

Through std.log("User: " + regsub(bereq.url,"^/user/([0-9]+)/.*$","\1")); we're logging the user that we found. The VSL_Log:User tag can be used in varnishlog to capture that logged user.

We'll extract the user id from the URL and store it in a global variable for later use. Initially we'll assume the user exists by setting the found value to global variable.

If we notice that the response status code for this type of response is 404, we know the user doesn't exist, so we override the global variable.

vcl_recv logic

Now that we know the status of certain requested users, we can reduce the number of cached variations for non-existent users.

The first thing we'll do is alphabetically sort the query string, this well reduce the number of variations if someone would put query string parameters in the wrong order.

If the request matches the ^/user/([0-9]+)/ URL pattern, we'll fetch the global variable that corresponds to the user id.

  • If the variable doesn't exist for the user, we'll proceed with the normal logic
  • If the variable exists and the value doesn't equal notfound, we'll also proceed with the normal logic
  • If the variable exists and has notfound as its value, we'll start narrowing down potential variations by stripping off query string parameters

By stripping off query string parameters for a user that is not found we're reducing the URL and as a consequence the cache key to /user/$ID/.

While this could potentially result in a cache miss, the corresponding cache hit will match every future request for that user, regardless of the query string parameters. Of course this logic only applies for the duration that the 404 is cached.

As soon as the user object expires from the cache, you go back into the vcl_backend_response logic and the user status is re-evaluated.

Tracking users in varnishlog

If you want to capture logs about user matching, you can use the following varnishlog command:

varnishlog -g request -i requrl -I respheader:Age -i berequrl -I VCL_Log:User -i berespstatus

Here's some potential log output:

*   << Request  >> 2
-   ReqURL         /user/1/?a=1
-   ReqURL         /user/1/?a=1
-   RespHeader     Age: 0
**  << BeReq    >> 3
--  BereqURL       /user/1/?a=1
--  BerespStatus   404
--  VCL_Log        User: 1

*   << Request  >> 5
-   ReqURL         /user/1/?a=1
-   ReqURL         /user/1/?a=1
-   ReqURL         /user/1/
-   RespHeader     Age: 0
**  << BeReq    >> 6
--  BereqURL       /user/1/
--  BerespStatus   404
--  VCL_Log        User: 1

*   << Request  >> 32770
-   ReqURL         /user/1/?a=1
-   ReqURL         /user/1/?a=1
-   ReqURL         /user/1/
-   RespHeader     Age: 2

*   << Request  >> 32772
-   ReqURL         /user/1/?b=1
-   ReqURL         /user/1/?b=1
-   ReqURL         /user/1/
-   RespHeader     Age: 7
  • The first request for /user/1/?a is a cache miss and the user will be marked
  • The second request spots the non-existent user and turns the URL into /user/1 which also results in a cache miss
  • The third request will also strip off the query string parameters and results in a cache hit
  • The fourth request that has a different query string parameter (/user/1/?b=1) will also result in a cache hit, because query string parameters are stripped off.

The Age response header shows how long the object has been stored in cache for. You'll notice that the cache age for /user/1/?a=1 is 0 twice because of the cache misses, but then it turns into 2.

A couple of seconds later /user/1/?b=1 is called, but because we turn it into /user/1/, a cached object is hit and the age goes to 7.

Conclusion

While this VCL code should help you reduce the number of cache variations per user, it is a very narrowly scope example with a lot of assumptions.

Use it as inspiration, but modify heavily to your use case.

Upvotes: 0

Related Questions