Reputation: 17
Trying to tackle a particular usecase with respect to caching mutliple requests at once
Example:
Request 1
{}
json with 200 status code (be it any query paramerets)user/1/config?arg1=val1&arg2=val2&arg3=val3
& xkey=user/1/config
)(Note: At the moment the logic choosing the cache key is fixed, i.e. it considers the resource path as well ad all input query parameters)
Request 2
/user/1/config
requests with synthetic response in varnish... is there any way where we can respond from the varnish itself? Or from the application end using some explicit status code to avoid sending the same content?Goals
Note
Xkey
is just resource path since the cache purging happens based on user idThanks
Upvotes: 0
Views: 325
Reputation: 4808
The assumption I'm going to make is that a non-existent user should always be considered "not found", regardless of the query string parameters used. The output will always be an empty JSON object.
As soon as the user is found, query string parameters matter again.
I have a solution for that marks non-existent users and strips off query string parameters as long as the result is an HTTP 404 response code.
Here's the code, it uses vmod_var
, which is part of https://github.com/varnish/varnish-modules/ and needs to be compiled from source. But since you mentioned xkey
, I'm going to assume you already compiled these modules:
vcl 4.0;
import var;
import std;
backend default {
.host = "127.0.0.1";
.port = "8080";
}
sub vcl_recv {
set req.url = std.querysort(req.url);
if(req.url ~ "^/user/([0-9]+)/" &&
var.global_get("User:" + regsub(req.url,"^/user/([0-9]+)/.*$","\1")) == "notfound") {
set req.url = regsub(req.url,"\?.*$","");
}
}
sub vcl_backend_response {
if(bereq.url ~ "^/user/([0-9]+)/") {
std.log("User: " + regsub(bereq.url,"^/user/([0-9]+)/.*$","\1"));
var.global_set("User:" + regsub(bereq.url,"^/user/([0-9]+)/.*$","\1"), "found");
if(beresp.status == 404) {
var.global_set("User:" + regsub(bereq.url,"^/user/([0-9]+)/.*$","\1"), "notfound");
}
}
}
In vcl_backend_response
, the VCL subroutine that is called prior to cache insertion, we'll look for responses that originate from a URL that matches the ^/user/([0-9]+)/
regex pattern.
Through std.log("User: " + regsub(bereq.url,"^/user/([0-9]+)/.*$","\1"));
we're logging the user that we found. The VSL_Log:User
tag can be used in varnishlog
to capture that logged user.
We'll extract the user id from the URL and store it in a global variable for later use. Initially we'll assume the user exists by setting the found
value to global variable.
If we notice that the response status code for this type of response is 404
, we know the user doesn't exist, so we override the global variable.
Now that we know the status of certain requested users, we can reduce the number of cached variations for non-existent users.
The first thing we'll do is alphabetically sort the query string, this well reduce the number of variations if someone would put query string parameters in the wrong order.
If the request matches the ^/user/([0-9]+)/
URL pattern, we'll fetch the global variable that corresponds to the user id.
notfound
, we'll also proceed with the normal logicnotfound
as its value, we'll start narrowing down potential variations by stripping off query string parametersBy stripping off query string parameters for a user that is not found we're reducing the URL and as a consequence the cache key to /user/$ID/
.
While this could potentially result in a cache miss, the corresponding cache hit will match every future request for that user, regardless of the query string parameters. Of course this logic only applies for the duration that the 404
is cached.
As soon as the user object expires from the cache, you go back into the vcl_backend_response
logic and the user status is re-evaluated.
If you want to capture logs about user matching, you can use the following varnishlog
command:
varnishlog -g request -i requrl -I respheader:Age -i berequrl -I VCL_Log:User -i berespstatus
Here's some potential log output:
* << Request >> 2
- ReqURL /user/1/?a=1
- ReqURL /user/1/?a=1
- RespHeader Age: 0
** << BeReq >> 3
-- BereqURL /user/1/?a=1
-- BerespStatus 404
-- VCL_Log User: 1
* << Request >> 5
- ReqURL /user/1/?a=1
- ReqURL /user/1/?a=1
- ReqURL /user/1/
- RespHeader Age: 0
** << BeReq >> 6
-- BereqURL /user/1/
-- BerespStatus 404
-- VCL_Log User: 1
* << Request >> 32770
- ReqURL /user/1/?a=1
- ReqURL /user/1/?a=1
- ReqURL /user/1/
- RespHeader Age: 2
* << Request >> 32772
- ReqURL /user/1/?b=1
- ReqURL /user/1/?b=1
- ReqURL /user/1/
- RespHeader Age: 7
/user/1/?a
is a cache miss and the user will be marked/user/1
which also results in a cache miss/user/1/?b=1
) will also result in a cache hit, because query string parameters are stripped off.The Age
response header shows how long the object has been stored in cache for. You'll notice that the cache age for /user/1/?a=1
is 0
twice because of the cache misses, but then it turns into 2
.
A couple of seconds later /user/1/?b=1
is called, but because we turn it into /user/1/
, a cached object is hit and the age goes to 7
.
While this VCL code should help you reduce the number of cache variations per user, it is a very narrowly scope example with a lot of assumptions.
Use it as inspiration, but modify heavily to your use case.
Upvotes: 0