knd
knd

Reputation: 1732

How does Python Re Module work in this examle?

What is the process of matching this regular expression? I don't get why the explicit group is 'c'. This is piece of code is taken from Python Re Module Doc.

>>> m = re.match("([abc])+", "abc")
>>> m.group()
'abc'
>>> m.groups()
('c',)

Also, what about:

>>> m = re.match("([abc]+)", "abc")
>>> m.group()
'abc'
>>> m.groups()
('abc',)

And:

>>> m = re.match("([abc])", "abc")
>>> m.group()
'a'
>>> m.groups()
('a',)

Thanks.

Upvotes: 4

Views: 180

Answers (2)

Jon Clements
Jon Clements

Reputation: 142136

re.match("([abc])+", "abc")

Matches a group consisting of a, b or c. The group at the end of that is the last character found in the character class as matching is greedy so, ends up with the last matching character which is c.

m = re.match("([abc]+)", "abc")

Matches a group that contains one or more consecutive occurences of a, b or c. The matching group at the end is the largest contingious group of a, b or c.

re.match("([abc])", "abc")

Matches either a, b or c. The match group will always be the first matching character at the start of the string.

Upvotes: 6

Nolen Royalty
Nolen Royalty

Reputation: 18633

In your first example, ([abc])+ creates a group for each a, b, or c character it finds. c is the explicit group because it's the last character that the regex matches:

>>> re.match("([abc])+", "abca").groups()
('a',)

In your second example, you're creating one group that matches one or more a's, b's, or c's in a row. Thus, you create one group for abc. If we extend abc, the group will extend with the string:

>>> re.match("([abc]+)", "abca").groups()
('abca',)

In your third example, the regex is searching for exactly one character that is either an a, a b, or a c. Since a is the first character in abc, you get an a. This changes if we change the first character in the string:

>>> re.match("([abc])", "cba").group()
'c'

Upvotes: 3

Related Questions