Regular expression help needed in python

Question

Can anyone help me form a regex to identify the pattern dd-ddd as a whole word in a sentence e.g. in a sentence like this -

11-222 should be matched at the beginning of the sentence, as well as 33-444 in the middle but not 55-66-777 since the whole word does not match the pattern. If the pattern is present at the end, that should also be matched like 88-999

If I use a regex expression like '\b\d{2}-\d{3}\b' it even matches 66-777 which is within 55-66-777. I need to exclude that. Somehow, - (hyphen) is treated as a boundary for a word.

Any idea how I can achieve this?

Added sample code and output

import re
regex_str = r'\b\d{2}-\d{3}\b'
msg_message = '11-222 should be matched, as well as 33-444 but not 55-66-777. If it is present at the end, that should also be matched like 88-999'
for match in re.finditer(regex_str, msg_message):
    print('*'*15)
    print(match.group(0))
    print(match.span())

O/p

***************
11-222
(0, 6)
***************
33-444
(37, 43)
***************
66-777
(55, 61)
***************
88-999
(125, 131)

ctwheels · Accepted Answer

You can use (?. This pattern ensures a whitespace character (or no character - i.e. start/end of string) before and after.


See it in use here
How it works:

(? ensure what precedes doesn't match a non-whitespace character

\d{2} match two digits
- match this character literally
\d{3} match three digits
(?!\S) ensure what follows doesn't match a non-whitespace character

The double negatives are used purposely. The alternative is to use (?<=\s|^) and (?=\s|$) respectively (but it's longer and less sexy).

Regular expression help needed in python

Answers (2)

Related Questions