Kolja Siegmund
Kolja Siegmund

Reputation: 13

RegEx to filter E-Mail Adresses from URLs in Google Analytics

I want to use a Google Analytics filter to remove email addresses from incoming URIs. I am using the custom advanced filter, filtering field A on a RegEx for the Request URI and replacing the respective part later. However, my RegEx does not seem to work correctly. It should find email addresses, not only if an '@' is used, but also if '(at)', '%40', or '$0040' are used to represent the '@'.

My latest RegEx version (see below) still allows '$0040' to go through undetected. Can someone advise me what to change?

^(.*)=([A-Z0-9._%+-]+[@|[\(at\)]|[\$0040]|[\%40]][A-Z0-9.-]+\.[A-Z]{2,4})(.*)$

Upvotes: 0

Views: 622

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626936

I suggest using

([A-Za-z0-9._%+-]+(@|\(at\)|[$]0040|\%40)[A-Za-z0-9.-]+\.[A‌​-Za-z]{2,4})

See the regex demo.

If you need to match the whole string, you may keep that pattern enclosed with your ^(.*) and (.*)$.

Details

  • ([A-Za-z0-9._%+-]+(@|\(at\)|[$]0040|\%40)[A-Za-z0-9.-]+\.[A‌​-Za-z]{2,4}) - Group 1 capturing
    • [A-Za-z0-9._%+-]+ - 1 or more ASCII letters/digits, ., _, %, +, or -
    • (@|\(at\)|[$]0040|\%40) - one of the alternatives: @, (at), $0040 or %40
    • [A-Za-z0-9.-]+ - 1 or more ASCII letters/digits, . or -
    • \. - a dot
    • [A‌​-Za-z]{2,4} - 2 to 4 ASCII letters.

Upvotes: 1

Related Questions