Reputation: 1502
for a typical set of word suffixes (ize,fy,ly,able...etc), I want to know if a given words ends with any of them, and subsequently remove them. I know this can be done iteratively with word.endswith('ize') for example, but I believe there is a neater regex way of doing it.. tried positive lookahead with an ending marker $ but for some reason didn't work:
pat='(?=ate|ize|ify|able)$'
word='terrorize'
re.findall(pat,word)
Upvotes: 1
Views: 9650
Reputation: 375484
Little-known fact: endswith
accepts a tuple of possibilities:
if word.endswith(('ate','ize','ify','able')):
#...
Unfortunately, it doesn't indicate which string was found, so it doesn't help with removing the suffix.
Upvotes: 5
Reputation: 10224
You need adjust parenthese, just change pat
from:
(?=ate|ize|ify|able)$
to:
(?=(ate|ize|ify|able)$)
If you need remove the suffixes later, you could use the pattern:
^(.*)(?=(ate|ize|ify|able)$)
Test in REPL:
>>> pat = '^(.*)(?=(ate|ize|ify|able)$)'
>>> word = 'terrorize'
>>> re.findall(pat, word)
[('terror', 'ize')]
Upvotes: 1
Reputation: 873
What you are looking for is actually (?:)
Check this out:
re.sub(r"(?:ate|ize|ify|able)$", "", "terrorize")
Have a look at this site Regex.
There are tones of useful regex skills. Hope you enjoy it.
BTW, the python library itself is a neat & wonderful tutorial.
I do help() a lot :)
Upvotes: 2
Reputation: 4976
If it's word-by-word matching then simply remove the look-ahead check, the $ caret is sufficient.
Upvotes: 0
Reputation: 1121256
A lookahead is an anchor pattern, just like ^
and $
anchor matches to a specific location but are not themselves a match.
You want to match these suffixes, but at the end of a word, so use the word-edge anchor \b
instead:
r'(ate|ize|ify|able)\b'
then use re.sub()
to replace those:
re.sub(r'(ate|ize|ify|able)\b', '', word)
which works just fine:
>>> word='terrorize'
>>> re.sub(r'(ate|ize|ify|able)\b', '', word)
'terror'
Upvotes: 2