Reputation: 133
op = ['TRAIL_RATE_ID 8 TRAIL_RATE_NAME VC-4 TRAIL_ORDER High Order ', 'TRAIL_RATE_ID 9 TRAIL_RATE_NAME VC4-4 TRAIL_ORDER High Order ' , 'TRAIL_RATE_ID 10 TRAIL_RATE_NAME VC-8 TRAIL_ORDER High Order ']
word = "8"
for op1 in op:
pp=re.search('(\\b'+word +'\\b)', op1, flags=re.IGNORECASE|re.DOTALL)
print bool(pp)
matches 2 occurrences of 8.
I want it to match only the first occurrence. The word can be word= "8" word = "$#hhd" word = "hi hello"
How do I match this using regex?
Upvotes: 2
Views: 1203
Reputation: 785128
Word boundaries won't help because -
is not considered a word character.
You can use lookarounds:
p = re.compile(r'(?:(?<=^)|(?<=\s))' + word + r'(?=\s|$)', flags=re.IGNORECASE|re.M)
re.search(p, op1)
(?<=^)|(?<=\s)
is a lookbehind to ensure we have line start or whitespace before our word(?=\s|$)
is a lookahead to ensure we have line end or whitespace next to our wordUpvotes: 4
Reputation: 626802
You can require that there should not be a non-whitespace symbol on both sides of the word:
r'(?<!\S){0}(?!\S)'.format(re.escape(word))
See the regex demo
I added re.escape(word)
in case your keywords contain special regex metacharacters that should be treated literally.
See Python demo:
import re
word = "8"
pat = r'(?<!\S){0}(?!\S)'.format(re.escape(word))
print re.search(pat,"nnn 8", flags=re.IGNORECASE)
Upvotes: 6