Hoonin
Hoonin

Reputation: 81

How to sort a list of strings by frequency?

I have a list of files

example_list = [7.gif, 8.gif, 123.html]

There are over 700k elements and I need to sort them by frequency to see the most accessed file and least accessed file.

for i in resl:
    if resl.count(i) > 500:
        resl2.append(i)
print(resl2)

When I run this it never compiles. And i have tried other methods but no results.

Upvotes: 1

Views: 95

Answers (4)

Joao Ponte
Joao Ponte

Reputation: 170

You can do this trick using a set ;)

Here you have a minimal example for a list of files and showing when it appears 2 times:

files = ['10.gif', '8.gif', '0.gif', '0.doc', '0.gif', '0.gif', '0.tmp', '0.doc', '0.gif']

file_set = set(files)
files_freq = [0]*len(file_set)

for n,file in enumerate(file_set):
    files_freq[n] = files.count(file)

sorted_list = [f for n,f in sorted(zip(files_freq, file_set), key=lambda x: x[0], reverse=True) if n >= 2]
print(sorted_list)

and the output will be: ['0.gif', '0.doc']

The set will filter the list only to unique occurrences of each file and the loop will calculate the count of each file.

After, the spooky list comprehension is the trick!

[f for n,f in sorted(zip(files_freq, file_set), key=lambda x: x[0], reverse=True) if n >= 2]

This will create a list only with the files which appeared 2 or more times, then the key part forces the sorted function to use the first files_freq from zip(files_freq, file_set) to do the sorting and reverse is to sort the list in descendant order, showing the highest frequencies before.

Upvotes: 0

Kelly Bundy
Kelly Bundy

Reputation: 27609

From your comment:

I just need to find out which file occurs the most.

So:

statistics.mode(example_list)

Upvotes: 1

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 95957

Your algorithm is unecessarily quadratic time. The following is linear

from collections import Counter
resl2 = [k for k,v in Counter(resl).items() if v > 500]

If you need them sorted, then do something like

resl2 = [(k,v) for k,v in Counter(resl).items() if v > 500]
resl2.sort(key=lambda kv: kv[1])
resl2 = [k for k,v in resl2]

Upvotes: 3

AzyCrw4282
AzyCrw4282

Reputation: 7744

Note that i represents an element from the array and not an integer

for i in resl:
    if resl.count(i) > 500:
        resl2.append(i)
print(resl2)

Change it to this.

for i in range(0,len(resl)-1):
    if i > 500:
        resl2.append(resl[i])
print(resl2)

Upvotes: 0

Related Questions