Reputation: 355
I got the following scenarios:
1) car on the right shoulder
2) car on the left shoulder
3) car on the shoulder
I want to match "shoulder" when left|right is not present. So only 3) return "shoulder"
re.compile(r'(?<!right|right\s*)shoulder')
sre_constants.error: look-behind requires fixed-width pattern
It seems like I can't use \s* and "|"
How can I solve this.
Thanks in advance!
Upvotes: 17
Views: 6818
Reputation: 41838
regex
module: variable-width lookbehindIn addition to the answer by HamZa, for any regex of any complexity in Python, I recommend using the outstanding regex
module by Matthew Barnett. It supports infinite lookbehind—one of the few engines to do so, along with .NET and JGSoft.
This allows you to do for instance:
import regex
if regex.search("(?<!right |left )shoulder", "left shoulder"):
print("It matches!")
else:
print("Nah... No match.")
You could also use \s+
if you wished.
Output:
Nah... No match.
Upvotes: 29
Reputation: 31706
The need for variable width look-behind can be avoided by combining a fixed-width positive look-behind with a negative look-ahead:
re.split('(?<=[\u4e00-\u9fff])(?![\u4e00-\u9fff])', '缩头乌龟suō tóu wūguī', 1)
# >>> Out[47]: ['缩头乌龟', 'suō tóu wūguī']
Upvotes: 0
Reputation: 14921
In most regex engines, lookbehinds needs to be of fixed width. This means you can't use quantifiers in a lookbehind in Python +*?
. The solution is to move \s*
outside your lookbehind:
(?<!left|right)\s*shoulder
You will notice that this expression matches every combination. So we need to change the quantifier from *
to +
:
(?<!left|right)\s+shoulder
The only problem with this solution is that it won't find shoulder
if it's at the beginning of the string, so we might add an alternative with an anchor:
^shoulder|(?<!left|right)\s+shoulder
If you want to get rid of the whitespaces just use the strip function.
Upvotes: 3