Hzyf
Hzyf

Reputation: 1017

Regex email(several types) extraction

I'm tring to extract email adressess from a content. I've a problem about false positives.

My regex for: [email protected]

[^\.^\w+](\w+) *?@ *?(\w+) *?(?:\.|dot) *?(\w+)

Regex for: [email protected]

[^\.^\w+](\w+) *?@ *?(\w+) *?(?:\.|dot) *?(\w+) *?(?:\.|dot) *?(\w+)

I want the first regex not to match with: [email protected]

How can I fix it?

Upvotes: 0

Views: 123

Answers (1)

Thomas Lulé
Thomas Lulé

Reputation: 450

The only way to distinguish [email protected] and [email protected] is to maintain a list of valid top level domains (yes, I'm sorry).

i.e, replacing your last (\w+) by (com|org|info|ly|... and so on.

There is no universal way.

Also, you could do only one regex.

Also, my address could be [email protected], be careful...

Upvotes: 1

Related Questions