Reputation: 1732
What is the process of matching this regular expression? I don't get why the explicit group is 'c'. This is piece of code is taken from Python Re Module Doc.
>>> m = re.match("([abc])+", "abc")
>>> m.group()
'abc'
>>> m.groups()
('c',)
Also, what about:
>>> m = re.match("([abc]+)", "abc")
>>> m.group()
'abc'
>>> m.groups()
('abc',)
And:
>>> m = re.match("([abc])", "abc")
>>> m.group()
'a'
>>> m.groups()
('a',)
Thanks.
Upvotes: 4
Views: 180
Reputation: 142136
re.match("([abc])+", "abc")
Matches a group consisting of a, b or c. The group at the end of that is the last character found in the character class as matching is greedy so, ends up with the last matching character which is c
.
m = re.match("([abc]+)", "abc")
Matches a group that contains one or more consecutive occurences of a, b or c. The matching group at the end is the largest contingious group of a, b or c.
re.match("([abc])", "abc")
Matches either a, b or c. The match group will always be the first matching character at the start of the string.
Upvotes: 6
Reputation: 18633
In your first example, ([abc])+
creates a group for each a, b, or c character it finds. c is the explicit group because it's the last character that the regex matches:
>>> re.match("([abc])+", "abca").groups()
('a',)
In your second example, you're creating one group that matches one or more a's, b's, or c's in a row. Thus, you create one group for abc
. If we extend abc
, the group will extend with the string:
>>> re.match("([abc]+)", "abca").groups()
('abca',)
In your third example, the regex is searching for exactly one character that is either an a, a b, or a c. Since a is the first character in abc
, you get an a. This changes if we change the first character in the string:
>>> re.match("([abc])", "cba").group()
'c'
Upvotes: 3