Gavin
Gavin

Reputation: 1521

Python regular expression to exclude the end with string

i have a file with rows like

From [email protected] Fri Jan  4 06:08:27 2008
Received: (from apache@localhost)
Return-Path: <[email protected]>
for <[email protected]>;

I was trying to read each line and use regular expression to find the domain name, basically the portion after the sign @. Here is the code I wrote

if re.search('[@]\S+?', line) : org = re.findall('@(\S+)',line)[0]

But it returns the following results

uct.ac.za
localhost)
collab.sakaiproject.org>
collab.sakaiproject.org>;

Is there any smart way to only keep the domain and do not include the ')', '>' or '>;' followed by the domain name?

Upvotes: 1

Views: 744

Answers (2)

Tim Seed
Tim Seed

Reputation: 5279

Slight correction - a FQDN can include numbers also...

so the regex needs a slight adjustment to

[@][a-zA-Z0-9.-]+

Full Domain rules at https://en.wikipedia.org/wiki/Uniform_Resource_Locator

Upvotes: 3

mkHun
mkHun

Reputation: 5927

Try this

use regex negation for to do it, [^\>\)\s]+

if re.search('@([^\>\)\s]+)', line) : org = re.findall('@([^\>\)\s]+)',line)[0]

output

uct.ac.za
localhost
collab.sakaiproject.org
collab.sakaiproject.org

Upvotes: 2

Related Questions