basil_man
basil_man

Reputation: 604

Python regex non-capturing not working within capturing groups

I was working a regex expression in Python to extract groups. I am correctly extracting the 3 groups I want (symbol, num, atom). However, the 'symbol' group should not have the '[' or ']' as I am using 'non-capturing' notation (?:..) per python's docs (https://docs.python.org/3/library/re.html).

Am I understanding non-capturing wrong, or is this a bug?

Thanks!

import re

result = re.match(r'(?P<symbol>(?:\[)(?P<num>[0-9]{0,3})(?P<atom>C)(?:\]))', '[12C]')

print(result.groups())
# ('[12C]', '12', 'C')
# expected: ('12C', '12', 'C')

Upvotes: -1

Views: 53

Answers (1)

flakes
flakes

Reputation: 23674

Move the checks for \[ and \] outside of the capture for P<symbol>. Moving them out of the capture will also mean you also don't need to use the non-capturing groups notation. e.g.

>>> import re
>>> result = re.match(r'\[(?P<symbol>(?P<num>[0-9]{0,3})(?P<atom>C))]', '[12C]')
>>> result.groups()
('12C', '12', 'C')

Upvotes: 2

Related Questions