user2180683
user2180683

Reputation: 219

How to organize list by frequency of occurrence and alphabetically (in case of a tie) while eliminating duplicates?

Basically if given a list:

data = ["apple", "pear", "cherry", "apple", "pear", "apple", "banana"]

I'm trying to make a function that returns a list like this:

["apple", "pear", "banana", "cherry"]

I'm trying to make the return list ordered by most frequently occurring word first while breaking ties by ordering them alphabetically. I also am trying to eliminate duplicates.

I've made lists already of the counts of each element and the indices of each element in data.

x = [n.count() for n in data]
z = [n.index() for n in data]

I don't know where to go from this point.

Upvotes: 8

Views: 2595

Answers (3)

Akavall
Akavall

Reputation: 86266

Here is a simple approach, but it should work.

data = ["apple", "pear", "cherry", "apple", "pear", "apple", "banana"]

from collections import Counter
from collections import defaultdict

my_counter = Counter(data)

# creates a dictionary with keys
# being numbers of occurrences and
# values being lists with strings
# that occured a given time
my_dict = defaultdict(list)
for k,v in my_counter.iteritems():
    my_dict[v].append(k)

my_list = []

for k in sorted(my_dict, reverse=True):
    # This is the second tie-break, if both
    # strings showed up the same number of times
    # and correspond to the same key, we sort them
    # by the alphabetical order
    my_list.extend(sorted(my_dict.get(k))) 

Result:

>>> my_list
['apple', 'pear', 'banana', 'cherry']

Upvotes: 0

Blender
Blender

Reputation: 298394

You could do something like this:

from collections import Counter

data = ["apple", "pear", "cherry", "apple", "pear", "apple", "banana"]

counts = Counter(data)
words = sorted(counts, key=lambda word: (-counts[word], word))

print words

Upvotes: 16

user1786283
user1786283

Reputation:

For ordering elements by frequency you can use, collections.most_common documentation here, so for example

from collections import Counter

data = ["apple", "pear", "cherry", "apple", "pear", "apple", "banana"]
print Counter(data).most_common()
#[('apple', 3), ('pear', 2), ('cherry', 1), ('banana', 1)]

Thanks to @Yuushi,

from collections import Counter

data = ["apple", "pear", "cherry", "apple", "pear", "apple", "banana"]
x =[a for (a, b) in Counter(data).most_common()]

print x
#['apple', 'pear', 'cherry', 'banana']

Upvotes: 3

Related Questions