Dhiwakar Ravikumar
Dhiwakar Ravikumar

Reputation: 2207

Python Regular Expressions - How is "+?" equivalent to "*"

   * : 0 or more occurrences of the pattern to its left
   + : 1 or more occurrences of the pattern to its left     
   ? : 0 or 1 occurrences of the pattern to its left

How is "+?" equivalent to "*" ?

Consider a search for any 3 letter word if it exists.

re1.search(r,'(\w\w\w)*,"abc")

In case of re1, * tries to get either 0 or more occurrences of the pattern to its left which in this case is the group of 3 letters. So it will either try to find a 3 letter word or fail

re2.search(r,'(\w\w\w)+?,"abc")

In case of re2, it's supposed to give the same output but I'm confused as to why "*" and "?+" are equivalent. Can you please explain this ?

Upvotes: 3

Views: 128

Answers (2)

Tim Pietzcker
Tim Pietzcker

Reputation: 336128

* and +? are not equivalent. The ? takes on a special meaning if it follows a quantifier, making that quantifier lazy.

Usually, quantifiers are greedy, meaning they will try to match as many repetitions as they can; lazy quantifiers match as few as they can. But a+? will still match at least one a.

In [1]: re.search("(a*)(.*)", "aaaaaa").groups()
Out[1]: ('aaaaaa', '')

In [2]: re.search("(a+?)(.*)", "aaaaaa").groups()
Out[2]: ('a', 'aaaaa')

In your example, both regexes happen to match the same text because both (\w\w\w)* and (\w\w\w)+? can match three letters, and there are exactly three letters in your input. But they will differ in other strings:

In [12]: re.search(r"(\w\w\w)+?", "abcdef")
Out[12]: <_sre.SRE_Match object; span=(0, 3), match='abc'>

In [13]: re.search(r"(\w\w\w)+?", "ab") # No match

In [14]: re.search(r"(\w\w\w)*", "abcdef")
Out[14]: <_sre.SRE_Match object; span=(0, 6), match='abcdef'>

In [15]: re.search(r"(\w\w\w)*", "ab")
Out[15]: <_sre.SRE_Match object; span=(0, 0), match=''>

Upvotes: 4

Omar Bazavilvazo
Omar Bazavilvazo

Reputation: 1

If you run with a simpler expression you will see is not the same:

import re
>>> re.search("[0-9]*", "1")
<_sre.SRE_Match object; span=(0, 1), match='1'>
>>> re.search("[0-9]*", "")
<_sre.SRE_Match object; span=(0, 0), match=''>
>>> re.search("[0-9]+", "")
>>> re.search("[0-9]+", "1")
<_sre.SRE_Match object; span=(0, 1), match='1'>

The problem in your code is (words)+?. is one or more or nothing

Upvotes: -1

Related Questions