Reputation: 5804
How to use regular expressions to only capture a word by itself rather than the word inside another word?
For example, I'd like to replace only the "Co" within "Company & Co."
import re
re.subn('Co','',"Company & Co")
>>('mpany & ', 2) #which i dont want
>> "Company & "#Desired Result
Upvotes: 0
Views: 72
Reputation: 2532
"Word itself" means that the word is spanned by spaces or beginning/end of the sentence. So...
re.subn('(\s|^)Co(\s|$)','\g<1>\g<2>',"Company & Co")
Upvotes: 3
Reputation: 338228
You want word boundaries.
They are expressed with \b
in most regex dialects (and with \<
and \>
in some). Python uses \b
.
import re
re.subn(r'\bCo\b', '', "Company & Co")
note the r
in front of the pattern.
Upvotes: 3
Reputation: 36514
Use the r"\b"
expression to match the empty string at the beginning or end of what you're looking for to ensure that it's a whole word and not part of another word:
>>> import re
>>> pat1 = re.compile("Co")
>>> pat2 = re.compile(r"\bCo\b")
>>> pat1.match("Company")
<_sre.SRE_Match object at 0x106b92780>
>>> pat2.search("Company")
# (fails)
>>> pat2.search("Co")
<_sre.SRE_Match object at 0x106b927e8>
>>> pat2.search("Co & Something")
<_sre.SRE_Match object at 0x106b92780> # succeeds
This syntax works whether the boundary between what you're looking for is:
Upvotes: 0
Reputation: 1473
what about this
import re
print re.subn('Co$','',"Company & Co")
these are called metacharacters, that are very useful and worth looking at.
Upvotes: 1