benraay
benraay

Reputation: 863

Removing utm_* parameters from URL in javascript with a regex

I did not find any good answer to this question so I share what I found and works

if you want to remove all the google analytics terms from an URL, you mostly want to keep the other parameters and get a clean valid URL at the end

url = url.replace(/(\&|\?)utm([_a-z0-9=+\-]+)/igm, "$1");

with a url like this https://www.somewebsite.fr/produit/yi-camera-3600-noir-vr-33705370/offre-81085802?utm_source=325483&utm_medium=affiliation&utm_content=catalogue-RDC&awc=6901_1530705916_88ef12642ad61dfc5239ba01bbbe5249

you will get this https://www.somewebsite.fr/produit/yi-camera-3600-noir-vr-33705370/offre-81085802?&&&awc=6901_1530705916_88ef12642ad61dfc5239ba01bbbe5249

this url is already valid but we have some dupe & signs if you remove the $1 from the first request you will with only a & sign and not the ? that you should have in the beginning

so next clean up we keep the first ? sign => $1 and remove the other leading &

url = url.replace(/(\?)\&+/igm, "$1");

here we have a nice clean URL

full version :

url = url.replace(/(\&|\?)utm([_a-z0-9=+\-]+)/igm, "$1");
url = url.replace(/(\?)\&+/igm, "$1");

if you can find a one liner you're welcome

Edit : the resulting URL should be this one : https://www.somewebsite.fr/produit/yi-camera-3600-noir-vr-33705370/offre-81085802?awc=6901_1530705916_88ef12642ad61dfc5239ba01bbbe5249

Upvotes: 3

Views: 3317

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627469

You may use a single regex compatible with all JS versions that will

  • match and capture ? that is followed by 1 or more utm param that are followed with a param other than utm one and replace with $1 to restore that ? since it is necessary
  • or, match any ? with 1 or more utm params in the query string where no params other than utm are present (so, $1 will be empty, and ? will get removed)
  • or, just match all utm params to remove them.

The regex will look like

.replace(/(\?)utm[^&]*(?:&utm[^&]*)*&(?=(?!utm[^\s&=]*=)[^\s&=]+=)|\?utm[^&]*(?:&utm[^&]*)*$|&utm[^&]*/gi, '$1')

See the regex demo

Details

  • (\?)utm[^&]*(?:&utm[^&]*)*&(?=(?!utm[^\s&=]*=)[^\s&=]+=) - ?utm (with ? inside a capturing group later referenced with $1), 0+ chars other than &, and then 0 or more repetitions of &utm followed with 0+ chars other than & and then a & that is followed with 0+ chars other than whitespace, & and = and then = that is not utm param
  • | - or
  • \?utm[^&]*(?:&utm[^&]*)*$ - ?utm, 0+ chars other than &, and then 0 or more repetitions of &utm followed with 0+ chars other than & and then the end of the string
  • | - or
  • &utm[^&]* - a &, utm and then 0+ chars other than &

JS demo:

var urls = ['https://www.somewebsite.fr/produit/yi-camera-3600-noir-vr-33705370/offre-81085802?utm_source=325483&utm_medium=affiliation&utm_content=catalogue-RDC&awc=6901_1530705916_88ef12642ad61dfc5239ba01bbbe5249', 'https://www.somewebsite.fr/produit/yi-camera-3600-noir-vr-33705370/offre-81085802?t=55&utm_source=325483&utm_medium=affiliation&utm_content=catalogue-RDC&awc=6901_1530705916_88ef12642ad61dfc5239ba01bbbe5249','https://www.somewebsite.fr/produit/yi-camera-3600-noir-vr-33705370/offre-81085802?awc=6901_1530705916_88ef12642ad61dfc5239ba01bbbe5249&utm_tt=78', 'https://www.somewebsite.fr/produit/yi-camera-3600-noir-vr-33705370/offre-81085802?utm=6901_1530705916_88ef12642ad61dfc5239ba01bbbe5249&utm=ewe'];

var u = 'utm[^&]*';
var rx = new RegExp("(\\?)"+u+"(?:&"+u+")*&(?=(?!utm[^\s&=]*=)[^\s&=]+=)|\\?"+u+"(?:&"+u+")*$|&"+u, "ig");
for (var url of urls) {
  console.log(url, "=>", url.replace(rx, '$1'));
}

Upvotes: 3

Fallenhero
Fallenhero

Reputation: 1583

I think it could be as simple as: url = url.replace(/(?<=&|\?)utm_.*?(&|$)/igm, "");

You do not need to escape &

(?<=&|\?) = positive lookbehind

.*? = everything, but "not greedy"

Upvotes: 7

Related Questions