Reputation: 157
I would like to match a word when it is after a char m
or b
So for example, when the word is men
, I would like to return en
(only the word that is following m
), if the word is beetles
then return eetles
Initially I tried (m|b)\w+
but it matches the entire men
not en
How do I write regex expression in this case? Thank you!
Upvotes: 1
Views: 1513
Reputation: 3220
(?<=[mb])\w+/
You can use this above regex. The regex means "Any word starts with m or b".
(?<=[mb])
: positive lookbehind\w+
: matches any word character (equal to [a-zA-Z0-9]+)Upvotes: 0
Reputation: 163207
You could get the match only using a positive lookbehind asserting what is on the left is either m or b using character class [mb]
preceded by a word boundary \b
(?<=\b[mb])\w+
(?<=
Positive lookbehind, assert what is directly to the left is\b[mb]
Word boundary, match either m
or b
)
Close lookbehind\w+
Match 1 + word charsIf there can not be anything after the the word characters, you can assert a whitespace boundary at the right using (?!\S)
(?<=\b[mb])\w+(?!\S)
Example code
import re
test_str = ("beetles men")
regex = r"(?<=\b[mb])\w+"
print(re.findall(regex, test_str))
Output
['eetles', 'en']
Upvotes: 3
Reputation: 626691
You may use
\b[mb](\w+)
See the regex demo.
NOTE: When your known prefixes include multicharacter sequences, say, you want to find words starting with m
or be
, you will have to use a non-capturing group rather than a character class: \b(?:m|be)(\w+)
. The current solution can thus be written as \b(?:m|b)(\w+)
(however, a character class here looks more natural, unless you have to build the regex dynamically).
Details
\b
- a word boundary[mb]
- m
or b
(\w+)
- Capturing group 1: any one or more word chars, letters, digits or underscores. To match only letters, use ([^\W\d_]+)
instead.import re
rx = re.compile(r'\b[mb](\w+)')
text = "The words are men and beetles."
# First occurrence:
m = rx.search(text)
if m:
print(m.group(1)) # => en
# All occurrences
print( rx.findall(text) ) # => ['en', 'eetles']
Upvotes: 0