User4679
User4679

Reputation: 111

Python regex with negated pattern

I'm trying to write a regex statement in Python with a negated pattern. I want to match a pattern that doesn't start with a U followed by a W and optionally ends with a 1. Below are some examples.

TUW1TH > # regex does not get applied
JUWRG > # regex does not get applied
BUIUW1 > # regex does not get applied
ATWKO > ATW KO # regex applies and space is added after the W
EWRG > E WRG # regex applies and space is added after the W
AGDTWSD > AGDTW SD # regex applies and space is added after the W

Below is the regex statement I tried to use:

 re.sub(ur"[^U]W[^?1]", ur"W ", word)

Upvotes: 2

Views: 110

Answers (3)

RootTwo
RootTwo

Reputation: 4418

I think you are asking to match a 'W' optionally followed by a '1', but only if the 'W' is not preceded by a 'U'. If that is the case, a "negative look behind" is the answer:

import re

testcases = ['TUW1TH', 'JUWRG', 'BUIUW1', 'ATWKO', 'EWRG', 'AGDTWSD', 'W1EF', 'EW1RG']

# The `(W1?)` part matches a 'W' with an optional '1'. The `(?<!U)` part 
#     matches the current position only if it wasn't a preceded by a 'U'
pattern = re.compile(r'(?<!U)(W1?)')

for s in testcases:
    print(pattern.sub(r'\1 ', s))

outputs:

TUW1TH
JUWRG
BUIUW1
ATW KO
EW RG
AGDTW SD
W1 EF
EW1 RG

Note: [^U] doesn't work at the beginning of a line.

Upvotes: 2

mhawke
mhawke

Reputation: 87064

Try regex pattern ([^U])W1?' and use it with re.sub() with a substitution that references the captured group, like this:

import re

pattern = re.compile(r'([^U]W)1?')
for s in 'TUW1TH', 'JUWRG', 'BUIUW1', 'ATWKO', 'EWRG', 'AGDTWSD':
    print(pattern.sub(r'\1 ', s))

Output

TUW1TH
JUWRG
BUIUW1
ATW KO
EW RG
AGDTW SD

Note that the output for 'EWRG' differs from your sample... I think that's a typo in your question?

Your question wasn't clear about what to do with the optional 1 following the W and there was no sample to demonstrate. Is the 1 to be removed, or kept? The above code will lose the 1:

>>> print(pattern.sub(r'\1 ', 'TW1TH'))
TW TH

If you wanted the output to include the 1, then you can change the regex pattern to r'([^U]W)(1?)' to add a second capturing group for the optional 1, and change the substitution to r\1 \2:

>>> re.sub(r'([^U]W)(1?)', r'\1 \2', 'TW1TH')
'TW 1TH'

Upvotes: 0

Chris Kitching
Chris Kitching

Reputation: 2655

Looks like you want [^U]W1?

You used a character class containing "not ?", instead of the token "optionally a 1".

Upvotes: 0

Related Questions