ethann
ethann

Reputation: 209

Using set operators with python regex module

I'm having trouble getting set operators to work in the regex module (regex 2013-11-29) in python-3.x. For example, to match ASCII characters minus punctuation I have tried:

import regex as rx

data = '(foo)'
for m in rx.finditer(r'[\p{ASCII}--\p{P}]+',data):
    print(m.group(0))     # expect 'foo', getting '(foo)'

The documentation gives this example:

[\p{N}--[0-9]] # Set containing all numbers except '0' .. '9'

Am I missing something here?

Upvotes: 1

Views: 672

Answers (1)

arjache
arjache

Reputation: 101

It sounds like you need to explicitly opt into Version 1 behavior so that the -- is interpreted as a set operator and not as characters to include in the class.

From the module web page:

Version 1 behaviour (new behaviour, different from the current re module):

Indicated by the VERSION1 or V1 flag, or (?V1) in the pattern.

.split will split a string at a zero-width match.

Inline flags apply to the end of the group or pattern, and they can be turned off.

Nested sets and set operations are supported.

Case-insensitive matches in Unicode use full case-folding by default.

If no version is specified, the regex module will default to regex.DEFAULT_VERSION. In the short term this will be VERSION0, but in the longer term it will be VERSION1.

Upvotes: 1

Related Questions