Reputation: 1424
I have this regex for extracting emails which works fine:
([a-zA-Z][\w\.-]*[a-zA-Z0-9])@([a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z])
however there are some e-mails I don't want to include like:
[email protected]
[email protected]
[email protected]
I've been trying to add things like ^(?!server|noreplay|name) but isn't no working.
Also by using parentheses as above will afect tuples with (name, domain) ?
Upvotes: 0
Views: 1412
Reputation: 7109
Check the results from your regex for any emails that match the bad emails list.
results = list_from_your_regex
invalids = ['info', 'server', 'noreply', ...]
valid_emails = [good for good in results if good.split('@')[0] not in invalids]
Upvotes: 0
Reputation: 4522
Just check for those email addresses after you extract them...
bad_addresses=['[email protected]', '[email protected]', '[email protected]']
emails=re.findall('[a-zA-Z][\w\.-]*[a-zA-Z0-9])@([a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]', contentwithemails)
for item in emails[:]:
if item in bad_addresses:
emails.remove(item)
You have to do a slice of emails ( emails[:]
), because you can't do a for loop on a list that keeps changing size. This creates a "ghost" list that can be read while the real list is acted on.
Upvotes: 1