Pyderman
Pyderman

Reputation: 16189

Python: how best to combine two regex's into one pattern match?

Given a list of email header fields of type Receive e.g.:

Received: by 10.194.174.73 with SMTP id bq9csp183244wjc;
        Mon, 5 May 2014 17:49:10 -0700 (PDT)
X-Received: by 10.180.14.233 with SMTP id s9mr18354760wic.53.1399337350112;
        Mon, 05 May 2014 17:49:10 -0700 (PDT)
Received: from mail-wg0-f52.google.com
Received: by mail-ie0-x247.google.com with SMTP id gx4so163592215ieb.1
        for <[email protected]>; Mon, 01 Jun 2015 18:34:50 -0700 (PDT)

Each field reports the "hop" either by IP address or domain name. I'm looking to build a regex that will take care of both.

The following regex's will extract IP address and (gmail) domain name respectively:

\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b
mail.*com

What's the most elegant approach to combine two or more patterns in Python? I'll be iterating over a list of Receive fields and running the regex against each.

Upvotes: 2

Views: 6689

Answers (2)

timoh
timoh

Reputation: 176

If you want to just capture all domains and IPs of the hops you can use regex like this.

In python:

import re
pat = r'(?:by|for|from) <?([^\s;>]+)'
print(re.findall(pat, text))

->

['10.194.174.73', '10.180.14.233', 'mail-wg0-f52.google.com', 'mail-ie0-x247.google.com', '[email protected]>']

(edit to also capture the email)

Upvotes: 1

Adam Smith
Adam Smith

Reputation: 54173

Why not use an alternation?

patterns = [r"\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b", r"mail.*com"]
pattern = "|".join(patterns)  # pattern1|pattern2|pattern3|...

re.findall(pattern, text)

Yields

['10.194.174.73',
 '10.180.14.233',
 'mail-wg0-f52.google.com',
 'mail-ie0-x247.google.com',
 '[email protected]']

Upvotes: 3

Related Questions