Leroy
Leroy

Reputation: 2137

How to keep specific query parameters in varnish

My Question is maybe a bit strange. Normally you would like to strip specific query parameters from the url to cache in Varnish. But I want to do the opposite of this. This is required for redirecting with some query parameters (like utm_source, etc.)

I have a set of query parameters which does NOT need to be stripped off, the rest can be stripped off.

Upvotes: 3

Views: 8025

Answers (2)

mana
mana

Reputation: 6547

Just to give an update on this using the latest version of Varnish (e.g. sub vcl_fetch has been renamed). The following worked for us:

In vcl_recv:

# Store original url in temporary header
set req.http.X-Original-Url = req.url;

# strip out query string
set req.url = regsub(req.url, "\?.*$", "");

In vcl_backend_response:

# restore URL params after a redirect
if (resp.status == 301 || resp.status == 302) {
  set resp.http.location = resp.http.location + "?" + regsuball(req.http.X-Original-Url, "(^.*(?=\?)|[?&](?!header1|header2)\w+=[^&]*)", "");
  # strip occurrences of `?&` after removing params
  set resp.http.location = regsub(resp.http.location, "(\?&|\?\?)", "?");
  # some more cleanup (empty `?` or `&` at end)
  set resp.http.location = regsub(resp.http.location, "(\?&|\?|&)$", "");
}

Where

  • ^.*(?=\?) keeps only the query string "everything that comes after the ?" (e.g. ?key1=value1&key2=value2)
  • [?&](?!header1|header2)\w+ matches key names that does not match header1 or header2
  • =[^&]* matches the = char and the value upto the next occurence of &

The following the regsubs are used for cleaning up. Hope that they are self-explaining.

Upvotes: 2

Leroy
Leroy

Reputation: 2137

After a while of trial and error I found a way of doing this.

First of all we used this code in sub vcl_recv to strip off any marketing query parameters to clean up the URL:

# Store original url in temporary header
set req.http.X-Original-Url = req.url;

# Strip all marketing get parameters
if(req.url ~ "(\?|&)(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=") {
    set req.url = regsuball(req.url, "(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=[%.-_A-z0-9]+&?", "");
}
set req.url = regsub(req.url, "(\?&|\?|&)$", "");

Next in sub vcl_fetch we used this code to reattach the marketing query parameters after a redirect but strip all other query parameters.

if (beresp.status == 301 || beresp.status == 302) {
    set beresp.http.location = beresp.http.location + "?" + regsub(req.http.X-Original-Url, ".*\?(.*)", "\1");
    set beresp.http.location = regsuball(beresp.http.location, "([&|?](?!gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)[\w\d]+=([%.-_A-z0-9]+)?)", "");  # Comment or remove this line to keep the original query parameters after redirect
    set beresp.http.location = regsub(beresp.http.location, "(\?&|\?|&)$", "");
    return (hit_for_pass);
}

I also made a quick enable/disable variant, so people can enable/disable the stripping of all non-marketing query parameters. See the comment in the sub vcl_fetch code

Upvotes: 4

Related Questions