Reputation: 12802
I have a couple email addresses, '[email protected]'
and '[email protected]'
.
In perl, I could take the To:
line of a raw email and find either of the above addresses with
/\w+@(tickets\.)?company\.com/i
In python, I simply wrote the above regex as'\w+@(tickets\.)?company\.com'
expecting the same result. However, [email protected]
isn't found at all and a findall on the second returns a list containing only 'tickets.'
. So clearly the '(tickets\.)?'
is the problem area, but what exactly is the difference in regular expression rules between Perl and Python that I'm missing?
Upvotes: 3
Views: 2326
Reputation: 64929
There isn't a difference in the regexes, but there is a difference in what you are looking for. Your regex is capturing only "tickets."
if it exists in both regexes. You probably want something like this
#!/usr/bin/python
import re
regex = re.compile("(\w+@(?:tickets\.)?company\.com)");
a = [
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]"
];
for string in a:
print regex.findall(string)
Upvotes: 1
Reputation: 204798
The documentation for re.findall
:
findall(pattern, string, flags=0) Return a list of all non-overlapping matches in the string. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.
Since (tickets\.)
is a group, findall
returns that instead of the whole match. If you want the whole match, put a group around the whole pattern and/or use non-grouping matches, i.e.
r'(\w+@(tickets\.)?company\.com)'
r'\w+@(?:tickets\.)?company\.com'
Note that you'll have to pick out the first element of each tuple returned by findall
in the first case.
Upvotes: 7
Reputation: 124307
I think the problem is in your expectations of extracted values. Try using this in your current Python code:
'(\w+@(?:tickets\.)?company\.com)'
Upvotes: 4
Reputation: 12803
Two problems jump out at me:
\
".
"So try:
r'\w+@(tickets\.)?company\.com'
EDIT
Sample output:
>>> import re
>>> exp = re.compile(r'\w+@(tickets\.)?company\.com')
>>> bool(exp.match("[email protected]"))
True
>>> bool(exp.match("[email protected]"))
True
Upvotes: 2