N0thing
N0thing

Reputation: 7155

RegEx: difference between "(?:...) and normal parentheses

>>> re.findall(r"(?:do|re|mi)+", "mimi")
['mimi']
>>> re.findall(r"(do|re|mi)+", "mimi")
['mi']

According to my understanding of the definitions, it should produce the same answer. The only difference between (...) and (?:...) should be whether or not we can use back-references later. Am I missing something?

(...)

Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the \number special sequence, described below. To match the literals '(' or ')', use ( or ), or enclose them inside a character class: [(] [)].

(?:...)

A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.

Upvotes: 1

Views: 626

Answers (3)

Birei
Birei

Reputation: 36282

The (?:...) does no grouping. So, in

re.findall(r"(?:do|re|mi)+", "mimi")

it returns one value for each match on the whole regular expression, in this case twice the mi string, so a list with one element, mimi.

The (...) does grouping and findall() will return a value for each parenthesized string matched. In

re.findall(r"(do|re|mi)+", "mimi")

matches mi and saves it as group 1, then continues and matches again mi, but inside the same parenthesized string, so overwrites the group 1, and in the end it returns the value from group 1, which it's only second mi.

Upvotes: 4

Bohemian
Bohemian

Reputation: 425378

Although the matching is the same, the non-grouping version (?...):

  • is slightly more efficient, because the effort of storing a reference to the captured group is avoided
  • may allow more captured groups. Most regex flavours have a limit of 9 separate groups that may be back referenced. Because only a single digit is used to name them (group zero is the entire match, hence 9 not 10)

Upvotes: 1

Kevin Reid
Kevin Reid

Reputation: 43942

No, you're not missing anything. The use of (?:...) is to be able to group things without putting unneeded items in the backreferences/matched substrings.

Upvotes: 1

Related Questions