Reputation: 219
Basically if given a list:
data = ["apple", "pear", "cherry", "apple", "pear", "apple", "banana"]
I'm trying to make a function that returns a list like this:
["apple", "pear", "banana", "cherry"]
I'm trying to make the return list ordered by most frequently occurring word first while breaking ties by ordering them alphabetically. I also am trying to eliminate duplicates.
I've made lists already of the counts of each element and the indices of each element in data.
x = [n.count() for n in data]
z = [n.index() for n in data]
I don't know where to go from this point.
Upvotes: 8
Views: 2595
Reputation: 86266
Here is a simple approach, but it should work.
data = ["apple", "pear", "cherry", "apple", "pear", "apple", "banana"]
from collections import Counter
from collections import defaultdict
my_counter = Counter(data)
# creates a dictionary with keys
# being numbers of occurrences and
# values being lists with strings
# that occured a given time
my_dict = defaultdict(list)
for k,v in my_counter.iteritems():
my_dict[v].append(k)
my_list = []
for k in sorted(my_dict, reverse=True):
# This is the second tie-break, if both
# strings showed up the same number of times
# and correspond to the same key, we sort them
# by the alphabetical order
my_list.extend(sorted(my_dict.get(k)))
Result:
>>> my_list
['apple', 'pear', 'banana', 'cherry']
Upvotes: 0
Reputation: 298394
You could do something like this:
from collections import Counter
data = ["apple", "pear", "cherry", "apple", "pear", "apple", "banana"]
counts = Counter(data)
words = sorted(counts, key=lambda word: (-counts[word], word))
print words
Upvotes: 16
Reputation:
For ordering elements by frequency you can use, collections.most_common
documentation here, so for example
from collections import Counter
data = ["apple", "pear", "cherry", "apple", "pear", "apple", "banana"]
print Counter(data).most_common()
#[('apple', 3), ('pear', 2), ('cherry', 1), ('banana', 1)]
Thanks to @Yuushi,
from collections import Counter
data = ["apple", "pear", "cherry", "apple", "pear", "apple", "banana"]
x =[a for (a, b) in Counter(data).most_common()]
print x
#['apple', 'pear', 'cherry', 'banana']
Upvotes: 3