Reputation: 263350
Are optional non-capturing groups redundant?
Is the following regex:
(?:wo)?men
semantically equivalent to the following regex?
(wo)?men
Upvotes: 9
Views: 2001
Reputation: 31389
A question elsewhere was asking the same and I provided an answer with an example in Python:
It doesn't "have the same effect" - in one case the group is captured and accessible, in the other it is only used to complete the match.
People use non-capturing groups when they are not interesting in accessing the value of the group - to save space for situations with many matches, but also for better performance in cases where the regex engine is optimised for it.
A useless example in Python to illustrate the point:
from timeit import timeit
import re
chars = 'abcdefghij'
s = ''.join(chars[i % len(chars)] for i in range(100000))
def capturing():
re.findall('(a(b(c(d(e(f(g(h(i(j))))))))))', s)
def noncapturing():
re.findall('(?:a(?:b(?:c(?:d(?:e(?:f(?:g(?:h(?:i(j))))))))))', s)
print(timeit(capturing, number=1000))
print(timeit(noncapturing, number=1000))
Output:
5.8383678999998665
1.0528525999998237
Note: this is in spite of PyCharm (if you happen to use it) warning "Unnecessary non-capturing group" - the warning is correct, but not the whole truth, clearly. It's logically unneeded, but definitely does not have the same practical effect.
If the reason you wanted to get rid of them was to suppress such warnings, PyCharm allows you to do so with this:
# noinspection RegExpUnnecessaryNonCapturingGroup
re.findall('(?:a(?:b(?:c(?:d(?:e(?:f(?:g(?:h(?:i(j))))))))))', s)
Another note for the pedantic: the examples above aren't strictly logically equivalent either. But they match the same strings, just with different results.
c = re.findall('(a(b(c(d(e(f(g(h(i(j))))))))))', s)
nc = re.findall('(?:a(?:b(?:c(?:d(?:e(?:f(?:g(?:h(?:i(j))))))))))', s)
c
is a list of 10-tuples ([('abcdefghij', 'bcdefghij', ..), ..]
), while nc
is a list of single strings (['j', ..]
).
Upvotes: 0
Reputation: 627409
Your (?:wo)?men
and (wo)?men
are semantically equivalent, but technically are different, namely, the first is using a non-capturing and the other a capturing group. Thus, the question is why use non-capturing groups when we have capturing ones?
Non-caprturing groups are of help sometimes.
Also, it is just makes our matches cleaner:
You can use a non-capturing group to retain the organisational or grouping benefits but without the overhead of capturing.
It does not seem a good idea to re-factor existing regular expressions to convert capturing to non-capturing groups, since it may ruin the code or require too much effort.
Upvotes: 12