Reputation: 209
I have a list that has 93 different strings. I need to find the 10 most frequent strings and the return must be in order from most frequent to least frequent.
mylist = ['"and', '"beware', '`twas', 'all', 'all', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'arms', 'as', 'as', 'awhile', 'back', 'bandersnatch', 'beamish', 'beware', 'bird', 'bite', 'blade', 'borogoves', 'borogoves', 'boy', 'brillig']
# this is just a sample of the actual list.
I dont have the newest version of python and cannot use a counter.
Upvotes: 1
Views: 9051
Reputation: 86168
David's solution is the best.
But probably more for fun than anything, here is a solution that does not import any module:
dicto = {}
for ele in mylist:
try:
dicto[ele] += 1
except KeyError:
dicto[ele] = 1
top_10 = sorted(dicto.iteritems(), key = lambda k: k[1], reverse = True)[:10]
Result:
>>> top_10
[('and', 13), ('all', 2), ('as', 2), ('borogoves', 2), ('boy', 1), ('blade', 1), ('bandersnatch', 1), ('beware', 1), ('bite', 1), ('arms', 1)]
EDIT:
Answering the follow up question:
new_dicto = {}
for val, key in zip(dicto.itervalues(), dicto.iterkeys()):
try:
new_dicto[val].append(key)
except KeyError:
new_dicto[val] = [key]
alph_sorted = sorted([(key,sorted(val)) for key,val in zip(new_dicto.iterkeys(), new_dicto.itervalues())], reverse = True)
Result:
>>> alph_sorted
[(13, ['and']), (2, ['all', 'as', 'borogoves']), (1, ['"and', '"beware', '`twas', 'arms', 'awhile', 'back', 'bandersnatch', 'beamish', 'beware', 'bird', 'bite', 'blade', 'boy', 'brillig'])]
The words that show up once are sorted alphabetically, if you notice some words have extra quotation marks in them.
EDIT:
Answering another follow up question:
top_10 = []
for tup in alph_sorted:
for word in tup[1]:
top_10.append(word)
if len(top_10) == 10:
break
Result:
>>> top_10
['and', 'all', 'as', 'borogoves', '"and', '"beware', '`twas', 'arms', 'awhile', 'back']
Upvotes: 3
Reputation: 133514
Without using Counter
as the modified version of the question requests
Changed to use heap.nlargest
as suggested by @Duncan
>>> from collections import defaultdict
>>> from operator import itemgetter
>>> from heapq import nlargest
>>> mylist = ['"and', '"beware', '`twas', 'all', 'all', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'arms', 'as', 'as', 'awhile', 'back', 'bandersnatch', 'beamish', 'beware', 'bird', 'bite', 'blade', 'borogoves', 'borogoves', 'boy', 'brillig']
>>> c = defaultdict(int)
>>> for item in mylist:
c[item] += 1
>>> [word for word,freq in nlargest(10,c.iteritems(),key=itemgetter(1))]
['and', 'all', 'as', 'borogoves', 'boy', 'blade', 'bandersnatch', 'beware', 'bite', 'arms']
Upvotes: 2
Reputation: 63707
In case your Python Version does not support Counter, you can do the way Counter is implemented
>>> import operator,collections,heapq
>>> counter = collections.defaultdict(int)
>>> for elem in mylist:
counter[elem]+=1
>>> heapq.nlargest(10,counter.iteritems(),operator.itemgetter(1))
[('and', 13), ('all', 2), ('as', 2), ('borogoves', 2), ('boy', 1), ('blade', 1), ('bandersnatch', 1), ('beware', 1), ('bite', 1), ('arms', 1)]
If you see the Counter Class, it creates a dictionary of the occurrence of all the elements present in the Iterable It then puts the data in an heapq, key is the value of the dictionary and retrieves the nargest
Upvotes: 1
Reputation: 174624
David's answer is the best - but if you are using a version of Python that does not include Counter from the collections module (which was introduced in Python 2.7), you can use this implementation of a counter class that does the same thing. I suspect that it would be slower than the module, but will do the same thing.
Upvotes: 3
Reputation: 18101
You could use a Counter
from the collections
module to do this.
from collections import Counter
c = Counter(mylist)
Then doing c.most_common(10)
returns
[('and', 13),
('all', 2),
('as', 2),
('borogoves', 2),
('boy', 1),
('blade', 1),
('bandersnatch', 1),
('beware', 1),
('bite', 1),
('arms', 1)]
Upvotes: 16