Reputation: 432
I have sentences that I want to identify words in, but not if it starts with an alphanumerical character. It's fine if it ends with one though.
An example of what I've done;
words = ["THIS", "THAT"]
sentences = ["I want to identify THIS word.", "And THAT!", "But I do not want to identify !THIS word", "Or [THIS] word"]
for sentence in sentences:
for word in words:
word_re = re.search(r"\b(%s)\b" %word, sentence)
if word_re:
print("It's a match!")
My output of the code above would be a match in each of the sentences. I would like something that only matches in the first two sentences. Is it possible to do what I want with regex?
Thanks!
Upvotes: 2
Views: 868
Reputation: 626926
You can use a regex like
(?<!\S)(?:THIS|THAT)\b
See the regex demo. Details:
(?<!\S)
- a left-hand whitespace boundary(?:THIS|THAT)
- a non-capturing group matching either THIS
or THAT
\b
- a word boundary.See the Python demo:
import re
words = ["THIS", "THAT"]
sentences = ["I want to identify THIS word.", "And THAT!", "But I do not want to identify !THIS word", "Or [THIS] word"]
pattern = fr"(?<!\S)(?:{'|'.join(words)})\b"
for sentence in sentences:
word_re = re.search(pattern, sentence)
if word_re:
print(f"'{sentence}' is a match!")
# => 'I want to identify THIS word.' is a match!
# 'And THAT!' is a match!
If THIS
or THAT
can contain special chars, replace pattern = fr"(?<!\S)(?:{'|'.join(words)})\b"
with pattern = fr"(?<!\S)(?:{'|'.join(map(re.escape, words))})\b"
.
Upvotes: 2