Reputation: 863
I did not find any good answer to this question so I share what I found and works
if you want to remove all the google analytics terms from an URL, you mostly want to keep the other parameters and get a clean valid URL at the end
url = url.replace(/(\&|\?)utm([_a-z0-9=+\-]+)/igm, "$1");
with a url like this
https://www.somewebsite.fr/produit/yi-camera-3600-noir-vr-33705370/offre-81085802?utm_source=325483&utm_medium=affiliation&utm_content=catalogue-RDC&awc=6901_1530705916_88ef12642ad61dfc5239ba01bbbe5249
you will get this https://www.somewebsite.fr/produit/yi-camera-3600-noir-vr-33705370/offre-81085802?&&&awc=6901_1530705916_88ef12642ad61dfc5239ba01bbbe5249
this url is already valid but we have some dupe & signs if you remove the $1 from the first request you will with only a & sign and not the ? that you should have in the beginning
so next clean up we keep the first ? sign => $1 and remove the other leading &
url = url.replace(/(\?)\&+/igm, "$1");
here we have a nice clean URL
full version :
url = url.replace(/(\&|\?)utm([_a-z0-9=+\-]+)/igm, "$1");
url = url.replace(/(\?)\&+/igm, "$1");
if you can find a one liner you're welcome
Edit : the resulting URL should be this one : https://www.somewebsite.fr/produit/yi-camera-3600-noir-vr-33705370/offre-81085802?awc=6901_1530705916_88ef12642ad61dfc5239ba01bbbe5249
Upvotes: 3
Views: 3317
Reputation: 627469
You may use a single regex compatible with all JS versions that will
?
that is followed by 1 or more utm
param that are followed with a param other than utm
one and replace with $1
to restore that ?
since it is necessary?
with 1 or more utm
params in the query string where no params other than utm
are present (so, $1
will be empty, and ?
will get removed)utm
params to remove them.The regex will look like
.replace(/(\?)utm[^&]*(?:&utm[^&]*)*&(?=(?!utm[^\s&=]*=)[^\s&=]+=)|\?utm[^&]*(?:&utm[^&]*)*$|&utm[^&]*/gi, '$1')
See the regex demo
Details
(\?)utm[^&]*(?:&utm[^&]*)*&(?=(?!utm[^\s&=]*=)[^\s&=]+=)
- ?utm
(with ?
inside a capturing group later referenced with $1
), 0+ chars other than &
, and then 0 or more repetitions of &utm
followed with 0+ chars other than &
and then a &
that is followed with 0+ chars other than whitespace, &
and =
and then =
that is not utm
param|
- or\?utm[^&]*(?:&utm[^&]*)*$
- ?utm
, 0+ chars other than &
, and then 0 or more repetitions of &utm
followed with 0+ chars other than &
and then the end of the string|
- or&utm[^&]*
- a &
, utm
and then 0+ chars other than &
JS demo:
var urls = ['https://www.somewebsite.fr/produit/yi-camera-3600-noir-vr-33705370/offre-81085802?utm_source=325483&utm_medium=affiliation&utm_content=catalogue-RDC&awc=6901_1530705916_88ef12642ad61dfc5239ba01bbbe5249', 'https://www.somewebsite.fr/produit/yi-camera-3600-noir-vr-33705370/offre-81085802?t=55&utm_source=325483&utm_medium=affiliation&utm_content=catalogue-RDC&awc=6901_1530705916_88ef12642ad61dfc5239ba01bbbe5249','https://www.somewebsite.fr/produit/yi-camera-3600-noir-vr-33705370/offre-81085802?awc=6901_1530705916_88ef12642ad61dfc5239ba01bbbe5249&utm_tt=78', 'https://www.somewebsite.fr/produit/yi-camera-3600-noir-vr-33705370/offre-81085802?utm=6901_1530705916_88ef12642ad61dfc5239ba01bbbe5249&utm=ewe'];
var u = 'utm[^&]*';
var rx = new RegExp("(\\?)"+u+"(?:&"+u+")*&(?=(?!utm[^\s&=]*=)[^\s&=]+=)|\\?"+u+"(?:&"+u+")*$|&"+u, "ig");
for (var url of urls) {
console.log(url, "=>", url.replace(rx, '$1'));
}
Upvotes: 3
Reputation: 1583
I think it could be as simple as:
url = url.replace(/(?<=&|\?)utm_.*?(&|$)/igm, "");
You do not need to escape &
(?<=&|\?)
= positive lookbehind
.*?
= everything, but "not greedy"
Upvotes: 7