borthkror
borthkror

Reputation: 83

Python Sort string by frequency - cannot sort with sorted() function

I have an issue with sorting a simple string by frequency (I get a string as an input, and I need to give a sorted string back as an output in descending order). Let me give you an example (the original word contains 4 e's, 2 s's, 1 t, 1 r and 1 d; so these get sorted):

In [1]: frequency_sort("treeseeds")
Out [1]: "eeeesstrd"

Most solutions on Stack Overflow state that I should use the sorted() function to get my results, however, it only seems to work with certain cases.

I made two functions that supposed to work, but none of them seems to do the trick with my specific inputs (see below).

First function:

def frequency_sort(s):
    s_sorted = sorted(s, key=s.count, reverse=True)
    s_sorted = ''.join(c for c in s_sorted)
    return s_sorted

Second function:

import collections
def frequency_sort_with_counter(s):
    counter = collections.Counter(s)
    s_sorted = sorted(s, key=counter.get, reverse=True)
    s_sorted = ''.join(c for c in s_sorted)
    return s_sorted

With both functions my outputs look like this:

The first output is okay:

In [1]: frequency_sort("loveleee")
Out [1]: "eeeellov"

The second output is not so much

In [2]: frequency_sort("loveleel")
Out [2]: "leleelov"

The third output is totally messy:

In [3]: frequency_sort("oloveleelo")
Out [3]: "oloeleelov"

What could have gone wrong? Is it connected to the 'o' and 'l' characters somehow? Or am I just missing something?

Upvotes: 4

Views: 493

Answers (3)

MSeifert
MSeifert

Reputation: 152667

The problem is that sort and sorted are stable sorts. So if two values are "equal" (in this case key(item1) == key(item2)) they will appear in the same order as they were before the sort.

For example in your last case you have:

>>> from collections import Counter

>>> Counter("oloveleelo")
Counter({'e': 3, 'l': 3, 'o': 3, 'v': 1})

So 'e', 'l' and 'o' have the same key, so they will appear just like they did originally: "oloeleelo" and then just comes the only character that has a different count: 'v'.

If you don't care about the order of elements with equal counts (just that they are grouped by character) you don't even need sorted, just flatten the result of Counter.most_common:

>>> ''.join([item for item, cnt in Counter("oloveleelo").most_common() for _ in range(cnt)])
'llleeeooov'
>>> ''.join([item for item, cnt in Counter("loveleel").most_common() for _ in range(cnt)])
'eeelllov'
>>> ''.join([item for item, cnt in Counter("loveleee").most_common() for _ in range(cnt)])
'eeeellov'

Upvotes: 0

Mohd
Mohd

Reputation: 5613

In your third case, 3 letters have the same count that's why they are put together, you can sort it first(by alphabet) then sort it by frequency to arrange the letters as the following:

s_sorted = ''.join(sorted(sorted(s), key=lambda x: s.count(x), reverse=True))

output:

eeellloooav

or you can reverse it:

s_sorted = ''.join(sorted(sorted(s, reverse=True), key=lambda x: s.count(x), reverse=True))

output:

ooollleeeva

Upvotes: 0

jakevdp
jakevdp

Reputation: 86328

In a string where multiple characters have the same frequency, the algorithms you proposed have no way of distinguishing between characters that appear the same number of times. You could address this by sorting using a tuple of the frequency and the character itself; e.g.

In [7]: def frequency_sort(s):
        s_sorted = sorted(s, key=lambda c: (s.count(c), c), reverse=True)
        s_sorted = ''.join(c for c in s_sorted)
        return s_sorted
   ...: 

In [8]: frequency_sort("loveleel")
Out[8]: 'llleeevo'

Upvotes: 5

Related Questions