Reputation: 31
I am dealing with Chinese NLP Problem. I find to find word has specific suffixs. For example, I have two list!
suffixs = ['aaa','bbb','cc'.....]
words_list = ['oneaaa','twobbb','three','four']
for w in words_list:
if w has suffix in suffixs:
func(s,w)
I know I can use re package, but re just can deal with less than 100 suffixs,but I have 1000+ suffixs. I try to use
for w in words_list:
for s in suffixs:
#suffixs sorted by lenth
if s is_suffix_of(w):
func(s,w)
break
But it is too slow.
The func(s,w) could split the word w to no_suffix word and suffix.
For example 'oneaaa' to ['one','aaa'],but the func bases on some condition and more complex.So any doesn't work here.
So I want to know whether a better way to deal with it.
Upvotes: 0
Views: 629
Reputation: 82939
If you just wan to see which words have "back-fixes" (the correct term is suffix, BTW), you can just use str.endswith
in combination with any
for w in words_list:
if any(w.endswith(b) for b in back_fixs):
print(w)
Or pass all the suffixes to endswith
, but for that they have to be in a tuple
, not list
:
back_fixs = tuple(back_fixs)
for w in words_list:
if w.endswith(back_fixs):
print(w)
If you also need to know which suffix matches, you can get the next
, or None
if non match:
for w in words_list:
b = next((b for b in back_fixs if w.endswith(b)), None)
if b:
print(w, b)
Or shorter using filter
: b = next(filter(w.endswith, back_fixs), None)
Or without default, using try/except
:
for w in words_list:
try:
print(w, next(filter(w.endswith, back_fixs)))
except StopIteration:
pass
Upvotes: 1