Richard Simões
Richard Simões

Reputation: 12802

Difference in regex behavior between Perl and Python?

I have a couple email addresses, '[email protected]' and '[email protected]'.

In perl, I could take the To: line of a raw email and find either of the above addresses with

/\w+@(tickets\.)?company\.com/i

In python, I simply wrote the above regex as'\w+@(tickets\.)?company\.com' expecting the same result. However, [email protected] isn't found at all and a findall on the second returns a list containing only 'tickets.'. So clearly the '(tickets\.)?' is the problem area, but what exactly is the difference in regular expression rules between Perl and Python that I'm missing?

Upvotes: 3

Views: 2326

Answers (4)

Chas. Owens
Chas. Owens

Reputation: 64929

There isn't a difference in the regexes, but there is a difference in what you are looking for. Your regex is capturing only "tickets." if it exists in both regexes. You probably want something like this

#!/usr/bin/python

import re

regex = re.compile("(\w+@(?:tickets\.)?company\.com)");

a = [
    "[email protected]", 
    "[email protected]", 
    "[email protected]",
    "[email protected]"
];

for string in a:
    print regex.findall(string)

Upvotes: 1

ephemient
ephemient

Reputation: 204798

The documentation for re.findall:

findall(pattern, string, flags=0)
    Return a list of all non-overlapping matches in the string.

    If one or more groups are present in the pattern, return a
    list of groups; this will be a list of tuples if the pattern
    has more than one group.

    Empty matches are included in the result.

Since (tickets\.) is a group, findall returns that instead of the whole match. If you want the whole match, put a group around the whole pattern and/or use non-grouping matches, i.e.

r'(\w+@(tickets\.)?company\.com)'
r'\w+@(?:tickets\.)?company\.com'

Note that you'll have to pick out the first element of each tuple returned by findall in the first case.

Upvotes: 7

chaos
chaos

Reputation: 124307

I think the problem is in your expectations of extracted values. Try using this in your current Python code:

'(\w+@(?:tickets\.)?company\.com)'

Upvotes: 4

David Berger
David Berger

Reputation: 12803

Two problems jump out at me:

  1. You need to use a raw string to avoid having to escape "\"
  2. You need to escape "."

So try:

r'\w+@(tickets\.)?company\.com'

EDIT

Sample output:

>>> import re
>>> exp = re.compile(r'\w+@(tickets\.)?company\.com')
>>> bool(exp.match("[email protected]"))
True
>>> bool(exp.match("[email protected]"))
True

Upvotes: 2

Related Questions