Reputation: 111
I'm trying to write a regex statement in Python with a negated pattern. I want to match a pattern that doesn't start with a U
followed by a W
and optionally ends with a 1
. Below are some examples.
TUW1TH > # regex does not get applied
JUWRG > # regex does not get applied
BUIUW1 > # regex does not get applied
ATWKO > ATW KO # regex applies and space is added after the W
EWRG > E WRG # regex applies and space is added after the W
AGDTWSD > AGDTW SD # regex applies and space is added after the W
Below is the regex statement I tried to use:
re.sub(ur"[^U]W[^?1]", ur"W ", word)
Upvotes: 2
Views: 110
Reputation: 4418
I think you are asking to match a 'W' optionally followed by a '1', but only if the 'W' is not preceded by a 'U'. If that is the case, a "negative look behind" is the answer:
import re
testcases = ['TUW1TH', 'JUWRG', 'BUIUW1', 'ATWKO', 'EWRG', 'AGDTWSD', 'W1EF', 'EW1RG']
# The `(W1?)` part matches a 'W' with an optional '1'. The `(?<!U)` part
# matches the current position only if it wasn't a preceded by a 'U'
pattern = re.compile(r'(?<!U)(W1?)')
for s in testcases:
print(pattern.sub(r'\1 ', s))
outputs:
TUW1TH
JUWRG
BUIUW1
ATW KO
EW RG
AGDTW SD
W1 EF
EW1 RG
Note: [^U]
doesn't work at the beginning of a line.
Upvotes: 2
Reputation: 87064
Try regex pattern ([^U])W1?'
and use it with re.sub()
with a substitution that references the captured group, like this:
import re
pattern = re.compile(r'([^U]W)1?')
for s in 'TUW1TH', 'JUWRG', 'BUIUW1', 'ATWKO', 'EWRG', 'AGDTWSD':
print(pattern.sub(r'\1 ', s))
Output
TUW1TH JUWRG BUIUW1 ATW KO EW RG AGDTW SD
Note that the output for 'EWRG'
differs from your sample... I think that's a typo in your question?
Your question wasn't clear about what to do with the optional 1
following the W
and there was no sample to demonstrate. Is the 1
to be removed, or kept? The above code will lose the 1
:
>>> print(pattern.sub(r'\1 ', 'TW1TH'))
TW TH
If you wanted the output to include the 1
, then you can change the regex pattern to r'([^U]W)(1?)'
to add a second capturing group for the optional 1
, and change the substitution to r\1 \2
:
>>> re.sub(r'([^U]W)(1?)', r'\1 \2', 'TW1TH')
'TW 1TH'
Upvotes: 0
Reputation: 2655
Looks like you want [^U]W1?
You used a character class containing "not ?", instead of the token "optionally a 1".
Upvotes: 0