Reputation: 209
I'm having trouble getting set operators to work in the regex module (regex 2013-11-29) in python-3.x. For example, to match ASCII characters minus punctuation I have tried:
import regex as rx
data = '(foo)'
for m in rx.finditer(r'[\p{ASCII}--\p{P}]+',data):
print(m.group(0)) # expect 'foo', getting '(foo)'
The documentation gives this example:
[\p{N}--[0-9]] # Set containing all numbers except '0' .. '9'
Am I missing something here?
Upvotes: 1
Views: 672
Reputation: 101
It sounds like you need to explicitly opt into Version 1 behavior so that the -- is interpreted as a set operator and not as characters to include in the class.
From the module web page:
Version 1 behaviour (new behaviour, different from the current re module):
Indicated by the VERSION1 or V1 flag, or (?V1) in the pattern.
.split will split a string at a zero-width match.
Inline flags apply to the end of the group or pattern, and they can be turned off.
Nested sets and set operations are supported.
Case-insensitive matches in Unicode use full case-folding by default.
If no version is specified, the regex module will default to regex.DEFAULT_VERSION. In the short term this will be VERSION0, but in the longer term it will be VERSION1.
Upvotes: 1