Regex that matches punctuation at the word boundary including underscore

Question

I am looking for a Python regex for a variable phrase with the following properties: (For the sake of example, let's assume the variable phrase here is taking the value and. But note that I need to do this in a way that the thing playing the role of and can be passed in as a variable which I'll call phrase.)

Should match: this_and, this.and, (and), [and], and^, ;And, etc.

Should not match: land, andy

This is what I tried so far (where phrase is playing the role of and):

pattern = r"\b  " + re.escape(phrase.lower()) + r"\b"

This seems to work for all my requirements except that it does not match words with underscores e.g. \_hello, hello\_, hello_world.

Edit: Ideally I would like to use the standard library re module rather than any external packages.

Wiktor Stribiżew · Accepted Answer

You may use

r'(?



See the regex demo. Compile with the re.I flag to enable case insensitive matching.

Details


(? - the preceding char should not be a letter or digit char

and - some keyword
(?![^\W_]) - the next char cannot be a letter or digit


Python demo:

import re
strs = ['this_and', 'this.and', '(and)', '[and]', 'and^', ';And', 'land', 'andy']
phrase = "and"
rx = re.compile(r'(?


Output:

this_and: True
this.and: True
(and): True
[and]: True
and^: True
;And: True
land: False
andy: False

Regex that matches punctuation at the word boundary including underscore

Answers (2)

Related Questions