Reputation: 1521
i have a file with rows like
From [email protected] Fri Jan 4 06:08:27 2008
Received: (from apache@localhost)
Return-Path: <[email protected]>
for <[email protected]>;
I was trying to read each line and use regular expression to find the domain name, basically the portion after the sign @. Here is the code I wrote
if re.search('[@]\S+?', line) : org = re.findall('@(\S+)',line)[0]
But it returns the following results
uct.ac.za
localhost)
collab.sakaiproject.org>
collab.sakaiproject.org>;
Is there any smart way to only keep the domain and do not include the ')', '>' or '>;' followed by the domain name?
Upvotes: 1
Views: 744
Reputation: 5279
Slight correction - a FQDN can include numbers also...
so the regex needs a slight adjustment to
[@][a-zA-Z0-9.-]+
Full Domain rules at https://en.wikipedia.org/wiki/Uniform_Resource_Locator
Upvotes: 3
Reputation: 5927
Try this
use regex negation for to do it, [^\>\)\s]+
if re.search('@([^\>\)\s]+)', line) : org = re.findall('@([^\>\)\s]+)',line)[0]
output
uct.ac.za
localhost
collab.sakaiproject.org
collab.sakaiproject.org
Upvotes: 2