Reputation: 16189
Given a list of email header fields of type Receive e.g.:
Received: by 10.194.174.73 with SMTP id bq9csp183244wjc;
Mon, 5 May 2014 17:49:10 -0700 (PDT)
X-Received: by 10.180.14.233 with SMTP id s9mr18354760wic.53.1399337350112;
Mon, 05 May 2014 17:49:10 -0700 (PDT)
Received: from mail-wg0-f52.google.com
Received: by mail-ie0-x247.google.com with SMTP id gx4so163592215ieb.1
for <[email protected]>; Mon, 01 Jun 2015 18:34:50 -0700 (PDT)
Each field reports the "hop" either by IP address or domain name. I'm looking to build a regex that will take care of both.
The following regex's will extract IP address and (gmail) domain name respectively:
\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b
mail.*com
What's the most elegant approach to combine two or more patterns in Python? I'll be iterating over a list of Receive fields and running the regex against each.
Upvotes: 2
Views: 6689
Reputation: 176
If you want to just capture all domains and IPs of the hops you can use regex like this.
In python:
import re
pat = r'(?:by|for|from) <?([^\s;>]+)'
print(re.findall(pat, text))
->
['10.194.174.73', '10.180.14.233', 'mail-wg0-f52.google.com', 'mail-ie0-x247.google.com', '[email protected]>']
(edit to also capture the email)
Upvotes: 1
Reputation: 54173
Why not use an alternation?
patterns = [r"\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b", r"mail.*com"]
pattern = "|".join(patterns) # pattern1|pattern2|pattern3|...
re.findall(pattern, text)
Yields
['10.194.174.73',
'10.180.14.233',
'mail-wg0-f52.google.com',
'mail-ie0-x247.google.com',
'[email protected]']
Upvotes: 3