Keely Aranyos
Keely Aranyos

Reputation: 209

how to get the 10 most frequent strings in a list in python

I have a list that has 93 different strings. I need to find the 10 most frequent strings and the return must be in order from most frequent to least frequent.

mylist = ['"and', '"beware', '`twas', 'all', 'all', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'arms', 'as', 'as', 'awhile', 'back', 'bandersnatch', 'beamish', 'beware', 'bird', 'bite', 'blade', 'borogoves', 'borogoves', 'boy', 'brillig']
 # this is just a sample of the actual list.

I dont have the newest version of python and cannot use a counter.

Upvotes: 1

Views: 9051

Answers (5)

Akavall
Akavall

Reputation: 86168

David's solution is the best.

But probably more for fun than anything, here is a solution that does not import any module:

dicto = {}

for ele in mylist:
    try:
        dicto[ele] += 1
    except KeyError:
        dicto[ele] = 1

top_10 = sorted(dicto.iteritems(), key = lambda k: k[1], reverse = True)[:10] 

Result:

>>> top_10
[('and', 13), ('all', 2), ('as', 2), ('borogoves', 2), ('boy', 1), ('blade', 1), ('bandersnatch', 1), ('beware', 1), ('bite', 1), ('arms', 1)]

EDIT:

Answering the follow up question:

new_dicto = {}

for val, key in zip(dicto.itervalues(), dicto.iterkeys()):

    try:
        new_dicto[val].append(key)
    except KeyError:
        new_dicto[val] = [key]

alph_sorted = sorted([(key,sorted(val)) for key,val in zip(new_dicto.iterkeys(), new_dicto.itervalues())], reverse = True)

Result:

>>> alph_sorted
[(13, ['and']), (2, ['all', 'as', 'borogoves']), (1, ['"and', '"beware', '`twas', 'arms', 'awhile', 'back', 'bandersnatch', 'beamish', 'beware', 'bird', 'bite', 'blade', 'boy', 'brillig'])]

The words that show up once are sorted alphabetically, if you notice some words have extra quotation marks in them.

EDIT:

Answering another follow up question:

top_10 = []

for tup in alph_sorted:
    for word in tup[1]:
        top_10.append(word)
        if len(top_10) == 10:
            break

Result:

>>> top_10
['and', 'all', 'as', 'borogoves', '"and', '"beware', '`twas', 'arms', 'awhile', 'back']

Upvotes: 3

jamylak
jamylak

Reputation: 133514

Without using Counter as the modified version of the question requests

Changed to use heap.nlargest as suggested by @Duncan

>>> from collections import defaultdict
>>> from operator import itemgetter
>>> from heapq import nlargest
>>> mylist = ['"and', '"beware', '`twas', 'all', 'all', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'arms', 'as', 'as', 'awhile', 'back', 'bandersnatch', 'beamish', 'beware', 'bird', 'bite', 'blade', 'borogoves', 'borogoves', 'boy', 'brillig']
>>> c = defaultdict(int)
>>> for item in mylist:
        c[item] += 1


>>> [word for word,freq in nlargest(10,c.iteritems(),key=itemgetter(1))]
['and', 'all', 'as', 'borogoves', 'boy', 'blade', 'bandersnatch', 'beware', 'bite', 'arms']

Upvotes: 2

Abhijit
Abhijit

Reputation: 63707

In case your Python Version does not support Counter, you can do the way Counter is implemented

>>> import operator,collections,heapq
>>> counter = collections.defaultdict(int)
>>> for elem in mylist:
    counter[elem]+=1        
>>> heapq.nlargest(10,counter.iteritems(),operator.itemgetter(1))
[('and', 13), ('all', 2), ('as', 2), ('borogoves', 2), ('boy', 1), ('blade', 1), ('bandersnatch', 1), ('beware', 1), ('bite', 1), ('arms', 1)]

If you see the Counter Class, it creates a dictionary of the occurrence of all the elements present in the Iterable It then puts the data in an heapq, key is the value of the dictionary and retrieves the nargest

Upvotes: 1

Burhan Khalid
Burhan Khalid

Reputation: 174624

David's answer is the best - but if you are using a version of Python that does not include Counter from the collections module (which was introduced in Python 2.7), you can use this implementation of a counter class that does the same thing. I suspect that it would be slower than the module, but will do the same thing.

Upvotes: 3

David Alber
David Alber

Reputation: 18101

You could use a Counter from the collections module to do this.

from collections import Counter
c = Counter(mylist)

Then doing c.most_common(10) returns

[('and', 13),
 ('all', 2),
 ('as', 2),
 ('borogoves', 2),
 ('boy', 1),
 ('blade', 1),
 ('bandersnatch', 1),
 ('beware', 1),
 ('bite', 1),
 ('arms', 1)]

Upvotes: 16

Related Questions