Reputation: 31
I am totally confused by grouping in python. For example:
import re
m = re.search('From:\s+(.+)\s+To:\s+(.*)\s*', 'From: Toronto To: Beijing')
print m.group(0)
print m.group(1)
print m.group(2)
I can get 3 results after I run this program, but I don't know why I get such results:
From: Toronto To: Beijing
Toronto
Beijing
In addition, sometimes when I use the group
method of a match-object I will get a warning saying "no such group".
So, can anyone explain the usage of grouping in regular expressions to me in plain language?
Upvotes: 0
Views: 210
Reputation: 11
As you are probably aware, the use of parenthesis in a regular expression creates a capturing group (unless you tell it not to do so, in which case it will be a non-capturing group). So in the regular expression you present you have the 0th group, which is always the full string it captures, the 1st group, which corresponds to the (.+)
and the 2nd which corresponds to the (.*)
Upvotes: 1
Reputation: 78800
When you use parenthesis in your regular expression, this indicates a group. You do that two times, and the groups are named starting at 1 and reading from left to right in your regular expression.
Group 0 is a special group for the whole match.
To make a group non-matching, use (?:something)
.
Demo:
>>> s = '12 34 56'
>>> m = re.search('(\d+)\s+(?:\d+)\s+(\d+)', s)
>>> m.group(0) # everything
'12 34 56'
>>> m.group(1) # content of first matching group
'12'
>>> m.group(2) # content of second matching group
'56'
m.groups()
will give you the content of all matching groups, in order:
>>> m.groups()
('12', '56')
Upvotes: 1