Reputation: 20915
I tried separate m's in a python regex by using word boundaries and find them all. These m's should either have a whitespace on both sides or begin/end the string:
r = re.compile("\\bm\\b")
re.findall(r, someString)
However, this method also finds m's within words like I'm
since apostrophes are considered to be word boundaries. How do I write a regex that doesn't consider apostrophes as word boundaries?
I've tried this:
r = re.compile("(\\sm\\s) | (^m) | (m$)")
re.findall(r, someString)
but that just doesn't match any m. Odd.
Upvotes: 1
Views: 450
Reputation: 31
falsetru's answer is almost the equivalent of "\b except apostrophes", but not quite. It will still find matches where a boundary is missing. Using one of falsetru's examples:
>>> import re
>>> re.findall(r'(?<=\s)m(?=\s)|^m|m$', "mama")
['m']
It finds 'm', but there is no occurrence of 'm' in 'mama' that would match '\bm\b'. The first 'm' matches '\bm', but that's as close as it gets.
The regex that implements "\b without apostrophes" is shown below:
(?<=\s)m(?=\s)|^m(?=\s)|(?<=\s)m$|^m$
This will find any of the following 4 cases:
Upvotes: 1
Reputation: 12316
You don't even need look-around (unless you want to capture the m without the spaces), but your second example was inches away. It was the extra spaces (ok in python, but not within a regex) which made them not work:
>>> re.findall(r'\sm\s|^m|m$', "I m a boy")
[' m ']
>>> re.findall(r'\sm\s|^m|m$', "mamam")
['m', 'm']
>>> re.findall(r'\sm\s|^m|m$', "mama")
['m']
>>> re.findall(r'\sm\s|^m|m$', "I'm a boy")
[]
>>> re.findall(r'\sm\s|^m|m$', "I'm a boym")
['m']
Upvotes: 1
Reputation: 369224
Using lookaround assertion:
>>> import re
>>> re.findall(r'(?<=\s)m(?=\s)|^m|m$', "I'm a boy")
[]
>>> re.findall(r'(?<=\s)m(?=\s)|^m|m$', "I m a boy")
['m']
>>> re.findall(r'(?<=\s)m(?=\s)|^m|m$', "mama")
['m']
>>> re.findall(r'(?<=\s)m(?=\s)|^m|m$', "pm")
['m']
(?=...)
Matches if
...
matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac(?=Asimov)
will match'Isaac '
only if it’s followed by'Asimov'
.
(?<=...)
Matches if the current position in the string is preceded by a match for
...
that ends at the current position. This is called a positive lookbehind assertion.(?<=abc)def
will find a match inabcdef
, ...
BTW, using raw string (r'this is raw string'
), you don't need to escape \
.
>>> r'\s' == '\\s'
True
Upvotes: 3