

How to count the number of words ending with the same suffix(word ending)?

I am trying to 1st divide up four-letter words based upon the last two letters of the word (suffix) and 2nd count up how many words I have for each of these endings.

I have a list containing 3,164 words called filtered and I have sorted them by their suffixes, which doesn't seem much of a help.

(I want to create a dictionary that takes the suffix as a key and the words as a list but I don't know where to begin!) It would be something like:


dic = {'ab': ['Ahab', 'Arab', 'Saab, ...]; 'al': ['Aral', 'Baal', ...]}

and so on. Would that be possible?

filtered.sort(key= lambda x : x[-2:])

['HSBC', 'UCLA', 'FNMA', 'SARS', 'OHSA', 'Ahab', 'Arab', 'Saab', 'blab', 'crab', 'drab', 'flab', 'grab', 'scab', 'slab', 'stab', 'swab', 'Brad', 'Chad', 'Head', 'Mead', 'Thad', 'Vlad', 'bead', 'brad', 'clad', 'dead', 'glad', 'goad', 'grad', 'head', 'iPad', 'lead', 'load', 'mead', 'quad', 'read', 'road', 'scad', 'shad', 'toad', 'Olaf', 'Piaf', 'deaf', 'leaf', 'loaf', 'brag', 'crag', 'drag', 'flag', 'shag', 'slag', 'snag', 'stag', 'swag', 'Leah', 'Noah', 'Ptah', 'Utah', 'blah', 'shah', 'yeah', 'Thai', 'beak', 'flak', 'leak', 'peak', 'soak', 'teak', 'weak', 'Aral', 'Baal', 'Dial', 'Neal', 'Opal', 'Ural', 'anal', 'coal', 'deal', 'dial', 'dual', 'foal', 'goal', 'heal', 'meal', 'opal', 'oral', 'oval', 'peal', 'real', 'seal', 'teal', 'veal', 'vial', 'weal', 'zeal', 'Adam', 'Edam', 'Elam', 'Guam', 'Siam', 'Spam', 'beam', 'clam', 'cram', 'dram', 'exam', 'foam', 'gram', 'imam', 'loam', 'pram', 'ream', 'roam', 'scam', 'seam', 'sham', 'slam', 'swam', 'team', 'tram', 'wham', 'Adan', 'Alan', 'Bean', 'Bran', 'Chan', 'Dean', 'Evan', 'Fran', 'Iran', 'Ivan', 'Jean', 'Joan', 'Juan', 'Khan', 'Klan', 'Kwan', 'Lean', 'Oman', 'Oran', 'Ryan', 'Sean', 'Sian', 'Stan', 'Tran', 'Yuan', 'bean', 'bran', 'clan', 'dean', 'flan', 'khan', 'lean', 'loan', 'mean', 'moan', 'plan', 'roan', 'scan', 'span', 'swan', 'than', 'wean', 'chap', 'clap', 'crap', 'flap', 'heap', 'leap', 'reap', 'slap', 'snap', 'soap', 'swap', 'trap', 'wrap', 'Iraq', 'Adar', 'Alar', 'Iyar', 'Lear', 'Omar', 'Paar', 'Saar', 'Thar', 'afar', 'agar', 'ajar', 'bear', 'boar', 'char', 'czar', 'dear', 'fear', 'gear', 'hear', 'liar', 'near', 'pear', 'rear', 'roar', 'scar', 'sear', 'soar', 'spar', 'star', 'tear', 'tsar', 'tzar', 'wear', 'year', 'Boas', 'Haas', 'Xmas', 'alas', 'baas', 'bias', 'boas', 'bras', 'eras', 'leas', 'peas', 'seas', 'spas', 'teas', 'yeas', 'Fiat', 'beat', 'boat', 'brat', 'chat', 'coat', 'feat', 'fiat', 'flat', 'frat', 'gnat', 'goat', 'heat', 'meat', 'moat', 'neat', 'peat', 'scat', 'seat'...]

Upvotes: 1

Views: 322

Answers (2)


Reputation: 1006

I have two solutions for you.

Solution 1:

>>> from itertools import groupby
>>> key_func = lambda s: s[-2:]
>>> suffix_dict = dict([(suffix, list(words)) for suffix, words in groupby(sorted(filtered, key=key_func), key_func)])

Solution 2:

>>> from collections import defaultdict
>>> suffix_dict = defaultdict(list)
>>> for word in filtered:
...     suffix_dict[word[-2:]].append(word)

Solution 2 would be relatively faster compared to Solution 1. Defaultdict will never raise a KeyError. Defaultdict works exactly like a dict in python, and it provides a default value for a non-existent key. More importantly, defaultdict is generally considered faster and optimized than dict, when there are lot of update operations involved. Several performance results have shown that defaultdict is often better than a normal dict, 1 and 2.

Benefits of Solution 1 over the other solution: Grouby needs two arguments - the data which it needs to group, and a function to group with. The data it iterates over needs to be sorted which is generally the same key function. So when you get the final result 'suffix_dict', each list of words would be already sorted. Whereas, in Solution 1, the ordering of words from the original list 'filtered' would be retained.

Also, the performance between the two solution is marginal, especially if your original list is small.

So you can choose whichever suits your need more.

The counting part is easy:

>>> { k: len(v) for k, v in suffix_dict.items()}




Upvotes: 1


Reputation: 106543

Assuming that suffixes are always two letters long and are case-sensitive, you can iterate through the word list and append each word to the dict of lists with the last two letters of the word as the key:

dic = {}
for word in filtered:
    dic.setdefault(word[-2:], []).append(word)

dic becomes:

{'BC': ['HSBC'], 'LA': ['UCLA'], 'MA': ['FNMA'], 'RS': ['SARS'], 'SA': ['OHSA'], 'ab': ['Ahab', 'Arab', 'Saab', 'blab', 'crab', 'drab', 'flab', 'grab', 'scab', 'slab', 'stab', 'swab'], 'ad': ['Brad', 'Chad', 'Head', 'Mead', 'Thad', 'Vlad', 'bead', 'brad', 'clad', 'dead', 'glad', 'goad', 'grad', 'head', 'iPad', 'lead', 'load', 'mead', 'quad', 'read', 'road', 'scad', 'shad', 'toad'], 'af': ['Olaf', 'Piaf', 'deaf', 'leaf', 'loaf'], 'ag': ['brag', 'crag', 'drag', 'flag', 'shag', 'slag', 'snag', 'stag', 'swag'], 'ah': ['Leah', 'Noah', 'Ptah', 'Utah', 'blah', 'shah', 'yeah'], 'ai': ['Thai'], 'ak': ['beak', 'flak', 'leak', 'peak', 'soak', 'teak', 'weak'], 'al': ['Aral', 'Baal', 'Dial', 'Neal', 'Opal', 'Ural', 'anal', 'coal', 'deal', 'dial', 'dual', 'foal', 'goal', 'heal', 'meal', 'opal', 'oral', 'oval', 'peal', 'real', 'seal', 'teal', 'veal', 'vial', 'weal', 'zeal'], 'am': ['Adam', 'Edam', 'Elam', 'Guam', 'Siam', 'Spam', 'beam', 'clam', 'cram', 'dram', 'exam', 'foam', 'gram', 'imam', 'loam', 'pram', 'ream', 'roam', 'scam', 'seam', 'sham', 'slam', 'swam', 'team', 'tram', 'wham'], 'an': ['Adan', 'Alan', 'Bean', 'Bran', 'Chan', 'Dean', 'Evan', 'Fran', 'Iran', 'Ivan', 'Jean', 'Joan', 'Juan', 'Khan', 'Klan', 'Kwan', 'Lean', 'Oman', 'Oran', 'Ryan', 'Sean', 'Sian', 'Stan', 'Tran', 'Yuan', 'bean', 'bran', 'clan', 'dean', 'flan', 'khan', 'lean', 'loan', 'mean', 'moan', 'plan', 'roan', 'scan', 'span', 'swan', 'than', 'wean'], 'ap': ['chap', 'clap', 'crap', 'flap', 'heap', 'leap', 'reap', 'slap', 'snap', 'soap', 'swap', 'trap', 'wrap'], 'aq': ['Iraq'], 'ar': ['Adar', 'Alar', 'Iyar', 'Lear', 'Omar', 'Paar', 'Saar', 'Thar', 'afar', 'agar', 'ajar', 'bear', 'boar', 'char', 'czar', 'dear', 'fear', 'gear', 'hear', 'liar', 'near', 'pear', 'rear', 'roar', 'scar', 'sear', 'soar', 'spar', 'star', 'tear', 'tsar', 'tzar', 'wear', 'year'], 'as': ['Boas', 'Haas', 'Xmas', 'alas', 'baas', 'bias', 'boas', 'bras', 'eras', 'leas', 'peas', 'seas', 'spas', 'teas', 'yeas'], 'at': ['Fiat', 'beat', 'boat', 'brat', 'chat', 'coat', 'feat', 'fiat', 'flat', 'frat', 'gnat', 'goat', 'heat', 'meat', 'moat', 'neat', 'peat', 'scat', 'seat']}

Upvotes: 2

Related Questions