inzanez
inzanez

Reputation: 366

Regular Expression - Match Email Address with Exceptions

Please read the question carefully, it's not about validating email addresses!

I'm trying to construct a regular expression (currently in C#) that extracts all email addresses from a text, with two specific exceptions.

I got:

all in the same text file on the same line, delimited by whitespace character.

At first I tried to match all of these email addresses except the ones starting with "user1". I used:

[\S]*(?<!user1)@[\S]*\..[a-zA-Z.]{1,}

which works well. Now I have another requirement that sais: Also do not match if the complete email address matches "[email protected]". So it should match "[email protected]", therefore I can't use:

[\S]*(?<!(user1|user2))@[\S]*\..[a-zA-Z.]{1,}

Therefore I tried an additional negative lookbehind:

([\S]*(?<!user1)@[\S]*\..[a-zA-Z.]{1,})(?<!user2@private\.com)

which doesn't work because it seems to be satisfied with matching "[email protected]" I guess. Is there any way to achieve what I'm trying to do? My head already hurts,...

I would use additional code, but as I'm using a third party software that only gives me the option of Regular Expression, and only the option of a single regular expression, that's all I've got,...

Upvotes: 2

Views: 292

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626932

A single regex solution that does not look nice is

(?<!\S)(?!user1@|user2@private\.com(?!\S))\S+@\S+\.[a-zA-Z]{2,}(?!\S)

See the regex demo.

Details:

  • (?<!\S) - a position not preceded with a non-whitespace char
  • (?!user1@|user2@private\.com(?!\S)) - that position cannot be followed with user1@ or [email protected] not followed with a non-whitespace char
  • \S+ - 1+ non-whitespace
  • @ - a literal @
  • \S+ - 1+ non-whitespace
  • \. - a dot
  • [a-zA-Z]{2,}(?!\S) - 2 or more ASCII letters not followed with a non-whitespace char.

A more readable approach is to split with whitespace, get the items matching @"^\S+@\S+\.\S+$" and use a bit of code to filter out unwanted matches:

var s = @"Text [email protected] here [email protected] and [email protected] here [email protected] more [email protected]";
var result = s.Split().Where(m => 
        Regex.IsMatch(m, @"^\S+@\S+\.\S+$") && m != "[email protected]" && !m.StartsWith("user1@"));
foreach (var str in result)
    Console.WriteLine(str);
// => [email protected], [email protected]

See C# demo.

Upvotes: 2

JonM
JonM

Reputation: 1374

You should be able to use a negative look ahead instead. The following solution should work if you have explicit emails you need to filter out. But keep in mind that it isn't exactly scalable. You would not want to have thousands of emails applied here.

^(?!user1|user2([email protected]))[\S]*@[\S]*\..[a-zA-Z.]{1,}

If you suspect that many of these rules could be applied at a future date then you might need to think about a better approach. If the emails to be filtered out are explicit (not patterns) then you could maintain a blacklist somewhere and filter them out after you have extracted/validated email address patterns.

Upvotes: 1

Related Questions