Peter
Peter

Reputation: 31

Grouping in Regular Expression

I am totally confused by grouping in python. For example:

import re

m = re.search('From:\s+(.+)\s+To:\s+(.*)\s*', 'From: Toronto To: Beijing')
print m.group(0)
print m.group(1)
print m.group(2)

I can get 3 results after I run this program, but I don't know why I get such results:

From: Toronto To: Beijing
Toronto
Beijing 

In addition, sometimes when I use the group method of a match-object I will get a warning saying "no such group".

So, can anyone explain the usage of grouping in regular expressions to me in plain language?

Upvotes: 0

Views: 210

Answers (2)

RegexPro
RegexPro

Reputation: 11

As you are probably aware, the use of parenthesis in a regular expression creates a capturing group (unless you tell it not to do so, in which case it will be a non-capturing group). So in the regular expression you present you have the 0th group, which is always the full string it captures, the 1st group, which corresponds to the (.+) and the 2nd which corresponds to the (.*)

Upvotes: 1

timgeb
timgeb

Reputation: 78800

When you use parenthesis in your regular expression, this indicates a group. You do that two times, and the groups are named starting at 1 and reading from left to right in your regular expression.

Group 0 is a special group for the whole match.

To make a group non-matching, use (?:something). Demo:

>>> s = '12 34 56'
>>> m = re.search('(\d+)\s+(?:\d+)\s+(\d+)', s)
>>> m.group(0) # everything
'12 34 56'
>>> m.group(1) # content of first matching group
'12'
>>> m.group(2) # content of second matching group
'56'

m.groups() will give you the content of all matching groups, in order:

>>> m.groups()
('12', '56')

Upvotes: 1

Related Questions