Reputation: 85
Can anyone help me form a regex to identify the pattern dd-ddd
as a whole word in a sentence e.g. in a sentence like this -
11-222
should be matched at the beginning of the sentence, as well as 33-444
in the middle but not 55-66-777
since the whole word does not match the pattern. If the pattern is present at the end, that should also be matched like 88-999
If I use a regex expression like '\b\d{2}-\d{3}\b'
it even matches 66-777
which is within 55-66-777
. I need to exclude that. Somehow, - (hyphen) is treated as a boundary for a word.
Any idea how I can achieve this?
Added sample code and output
import re
regex_str = r'\b\d{2}-\d{3}\b'
msg_message = '11-222 should be matched, as well as 33-444 but not 55-66-777. If it is present at the end, that should also be matched like 88-999'
for match in re.finditer(regex_str, msg_message):
print('*'*15)
print(match.group(0))
print(match.span())
O/p
***************
11-222
(0, 6)
***************
33-444
(37, 43)
***************
66-777
(55, 61)
***************
88-999
(125, 131)
Upvotes: 2
Views: 46
Reputation: 25489
You could use a negative lookbehind to match your pattern but not preceded by a hyphen
(?<!\-)\d{2}\-\d{3}
import re
regex_str = r'\b(?<!\-)\d{2}\-\d{3}\b'
msg_message = '11-222 should be matched, as well as 33-444 but not 55-66-777. If it is present at the end, that should also be matched like 88-999'
for match in re.finditer(regex_str, msg_message):
print('*'*15)
print(match.group(0))
print(match.span())
***************
11-222
(0, 6)
***************
33-444
(37, 43)
***************
88-999
(125, 131)
You could do the same with a negative lookahead (?!\-)
if you want to apply the same treatment to the right side of your expression.
Upvotes: 1
Reputation: 22817
You can use (?<!\S)\d{2}-\d{3}(?!\S)
. This pattern ensures a whitespace character (or no character - i.e. start/end of string) before and after.
How it works:
(?<!\S)
ensure what precedes doesn't match a non-whitespace character\d{2}
match two digits-
match this character literally\d{3}
match three digits(?!\S)
ensure what follows doesn't match a non-whitespace characterThe double negatives are used purposely. The alternative is to use (?<=\s|^)
and (?=\s|$)
respectively (but it's longer and less sexy).
Upvotes: 2