Reputation: 2207
* : 0 or more occurrences of the pattern to its left
+ : 1 or more occurrences of the pattern to its left
? : 0 or 1 occurrences of the pattern to its left
How is "+?" equivalent to "*" ?
Consider a search for any 3 letter word if it exists.
re1.search(r,'(\w\w\w)*,"abc")
In case of re1, * tries to get either 0 or more occurrences of the pattern to its left which in this case is the group of 3 letters. So it will either try to find a 3 letter word or fail
re2.search(r,'(\w\w\w)+?,"abc")
In case of re2, it's supposed to give the same output but I'm confused as to why "*" and "?+" are equivalent. Can you please explain this ?
Upvotes: 3
Views: 128
Reputation: 336128
*
and +?
are not equivalent. The ?
takes on a special meaning if it follows a quantifier, making that quantifier lazy.
Usually, quantifiers are greedy, meaning they will try to match as many repetitions as they can; lazy quantifiers match as few as they can. But a+?
will still match at least one a
.
In [1]: re.search("(a*)(.*)", "aaaaaa").groups()
Out[1]: ('aaaaaa', '')
In [2]: re.search("(a+?)(.*)", "aaaaaa").groups()
Out[2]: ('a', 'aaaaa')
In your example, both regexes happen to match the same text because both (\w\w\w)*
and (\w\w\w)+?
can match three letters, and there are exactly three letters in your input. But they will differ in other strings:
In [12]: re.search(r"(\w\w\w)+?", "abcdef")
Out[12]: <_sre.SRE_Match object; span=(0, 3), match='abc'>
In [13]: re.search(r"(\w\w\w)+?", "ab") # No match
In [14]: re.search(r"(\w\w\w)*", "abcdef")
Out[14]: <_sre.SRE_Match object; span=(0, 6), match='abcdef'>
In [15]: re.search(r"(\w\w\w)*", "ab")
Out[15]: <_sre.SRE_Match object; span=(0, 0), match=''>
Upvotes: 4
Reputation: 1
If you run with a simpler expression you will see is not the same:
import re
>>> re.search("[0-9]*", "1")
<_sre.SRE_Match object; span=(0, 1), match='1'>
>>> re.search("[0-9]*", "")
<_sre.SRE_Match object; span=(0, 0), match=''>
>>> re.search("[0-9]+", "")
>>> re.search("[0-9]+", "1")
<_sre.SRE_Match object; span=(0, 1), match='1'>
The problem in your code is (words)+?. is one or more or nothing
Upvotes: -1