Reputation: 29
Is there a way to improve this regular expression to search for all words that ends with t
, including don't
? I also want to print the whole words, not just the last t
.
r"\b\w*\Wt\b|\b\w*t\b"
I had to write out 2 separate cases for ending with either t
or 't
. Or this is the best it could be?
Upvotes: 2
Views: 60
Reputation: 18621
Do not rely on generic patterns if all you want is allow an apostrophe. \W
matches spaces, too. \S
matches any characters different from whitespace.
Use
r"\b\w+'?t\b"
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
'? '\'' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
t 't'
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
Upvotes: 0
Reputation: 57115
I'd use \b\S*t\b
. It fixes the problem of the engine having to scan a word only to fail to find the non-word character and try the other branch in your pattern. At the very least, swap the two sides of the alternation because the common-case is that the word won't have a contraction.
>>> import re
>>> s = "mitt cat bat don't foobar"
>>> re.findall(r"\b\S*t\b", s)
['mitt', 'cat', 'bat', "don't"]
It's not clear how you want to treat non-word punctuation, but consider a variant that attempts to handle this:
>>> s = "mitt cat bat. don't foobar tee t e.t."
>>> re.findall(r"\b\S*t\b", s)
['mitt', 'cat', 'bat', "don't", 't', 'e.t']
>>> re.findall(r"\b[^.,!?\s]*t\b", s)
['mitt', 'cat', 'bat', "don't", 't', 't']
Clearly, abbreviations and edge cases may need attention if that's part of your specification.
Upvotes: 2