Arman
Arman

Reputation: 194

Getting more matches than the expected in python

I was trying to do some pattern matching in python. But I am not able to understand why I get a second match when i just match for only one.

import re

def Main():
    m = "12312312ranger12312319"
    pattern = re.compile('(\d$)')
    r = pattern.search(m)
    if r:
        print "Matched " + r.group(0) +  " Second " + r.group(1)
    else:
        print "Not Matched"

if __name__ == '__main__':
    Main()

This gives me an output like this

Matched 9 Second 9

I am thinking r.group(1) should not be there at all. Am i understanding it wrongly ?

Upvotes: 1

Views: 90

Answers (3)

Kasravnd
Kasravnd

Reputation: 107287

It's because of $ sign , you match the end of string ! and also as 9 is the first and The entire matched pattern the group(0) (The entire match) and group(1) (The first parenthesized subgroup) both return 9 .

Regular expression visualization

Debuggex Demo

Now if you dont want group(1) you need to remove grouping from your pattern and use r'\d$' , but note that $ matched the last character 9 .

from wiki :

group() Returns one or more subgroups of the match. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Without arguments, group1 defaults to zero (the whole match is returned). If a groupN argument is zero, the corresponding return value is the entire matching string; if it is in the inclusive range [1..99], it is the string matching the corresponding parenthesized group. If a group number is negative or larger than the number of groups defined in the pattern, an IndexError exception is raised. If a group is contained in a part of the pattern that did not match, the corresponding result is None. If a group is contained in a part of the pattern that matched multiple times, the last match is returned.

example :

>>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
>>> m.group(0)       # The entire match
'Isaac Newton'
>>> m.group(1)       # The first parenthesized subgroup.
'Isaac'
>>> m.group(2)       # The second parenthesized subgroup.
'Newton'
>>> m.group(1, 2)    # Multiple arguments give us a tuple.
('Isaac', 'Newton')

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174706

Because you're both matching and capturing the last digit which was at the end of a line. So group(0) and group(1) refer to the same. (\d$) not only do capturing but also it will do the job of matching. So finally group(0) prints the matched characters and group(1) prints all the characters which are present inside the captured group index 1.

Upvotes: 1

Mikhail Gerasimov
Mikhail Gerasimov

Reputation: 39546

group(0) will always returns the whole text that was matched regardless of if it was captured in a group or not. See example:

import re

def Main():
    m = "12312312ranger12312319"
    pattern = re.compile('\d(\d$)')
    r = pattern.search(m)
    if r:
        print r.group(0) + ' ' + r.group(1)
    else:
        print "Not Matched"

if __name__ == '__main__':
    Main()

Output:

19 9

Upvotes: 5

Related Questions