Cauchyzhou
Cauchyzhou

Reputation: 31

Python find words has specific suffixs

I am dealing with Chinese NLP Problem. I find to find word has specific suffixs. For example, I have two list!

suffixs = ['aaa','bbb','cc'.....]

words_list = ['oneaaa','twobbb','three','four']

for w in words_list:
    if w has suffix in suffixs:
          func(s,w)

I know I can use re package, but re just can deal with less than 100 suffixs,but I have 1000+ suffixs. I try to use

for w in words_list:
    for s in suffixs:
         #suffixs sorted by lenth
         if s is_suffix_of(w):
               func(s,w)
               break

But it is too slow.
The func(s,w) could split the word w to no_suffix word and suffix.
For example 'oneaaa' to ['one','aaa'],but the func bases on some condition and more complex.So any doesn't work here.
So I want to know whether a better way to deal with it.

Upvotes: 0

Views: 629

Answers (1)

tobias_k
tobias_k

Reputation: 82939

If you just wan to see which words have "back-fixes" (the correct term is suffix, BTW), you can just use str.endswith in combination with any

for w in words_list:
    if any(w.endswith(b) for b in back_fixs):
          print(w)

Or pass all the suffixes to endswith, but for that they have to be in a tuple, not list:

back_fixs = tuple(back_fixs)
for w in words_list:
    if w.endswith(back_fixs):
          print(w)

If you also need to know which suffix matches, you can get the next, or None if non match:

for w in words_list:
    b = next((b for b in back_fixs if w.endswith(b)), None)
    if b:
          print(w, b)

Or shorter using filter: b = next(filter(w.endswith, back_fixs), None)

Or without default, using try/except:

for w in words_list:
    try:
        print(w, next(filter(w.endswith, back_fixs)))
    except StopIteration:
        pass

Upvotes: 1

Related Questions