Reputation: 13
I want to use a Google Analytics filter to remove email addresses from incoming URIs. I am using the custom advanced filter, filtering field A on a RegEx for the Request URI and replacing the respective part later. However, my RegEx does not seem to work correctly. It should find email addresses, not only if an '@' is used, but also if '(at)', '%40', or '$0040' are used to represent the '@'.
My latest RegEx version (see below) still allows '$0040' to go through undetected. Can someone advise me what to change?
^(.*)=([A-Z0-9._%+-]+[@|[\(at\)]|[\$0040]|[\%40]][A-Z0-9.-]+\.[A-Z]{2,4})(.*)$
Upvotes: 0
Views: 622
Reputation: 626936
I suggest using
([A-Za-z0-9._%+-]+(@|\(at\)|[$]0040|\%40)[A-Za-z0-9.-]+\.[A-Za-z]{2,4})
See the regex demo.
If you need to match the whole string, you may keep that pattern enclosed with your ^(.*)
and (.*)$
.
Details
([A-Za-z0-9._%+-]+(@|\(at\)|[$]0040|\%40)[A-Za-z0-9.-]+\.[A-Za-z]{2,4})
- Group 1 capturing
[A-Za-z0-9._%+-]+
- 1 or more ASCII letters/digits, .
, _
, %
, +
, or -
(@|\(at\)|[$]0040|\%40)
- one of the alternatives: @
, (at)
, $0040
or %40
[A-Za-z0-9.-]+
- 1 or more ASCII letters/digits, .
or -
\.
- a dot[A-Za-z]{2,4}
- 2 to 4 ASCII letters.Upvotes: 1