Reputation: 81
I have a list of files
example_list = [7.gif, 8.gif, 123.html]
There are over 700k elements and I need to sort them by frequency to see the most accessed file and least accessed file.
for i in resl:
if resl.count(i) > 500:
resl2.append(i)
print(resl2)
When I run this it never compiles. And i have tried other methods but no results.
Upvotes: 1
Views: 95
Reputation: 170
You can do this trick using a set
;)
Here you have a minimal example for a list of files and showing when it appears 2 times:
files = ['10.gif', '8.gif', '0.gif', '0.doc', '0.gif', '0.gif', '0.tmp', '0.doc', '0.gif']
file_set = set(files)
files_freq = [0]*len(file_set)
for n,file in enumerate(file_set):
files_freq[n] = files.count(file)
sorted_list = [f for n,f in sorted(zip(files_freq, file_set), key=lambda x: x[0], reverse=True) if n >= 2]
print(sorted_list)
and the output will be: ['0.gif', '0.doc']
The set
will filter the list only to unique occurrences of each file and the loop will calculate the count of each file.
After, the spooky list comprehension is the trick!
[f for n,f in sorted(zip(files_freq, file_set), key=lambda x: x[0], reverse=True) if n >= 2]
This will create a list only with the files which appeared 2 or more times, then the key
part forces the sorted
function to use the first files_freq
from zip(files_freq, file_set)
to do the sorting and reverse
is to sort the list in descendant order, showing the highest frequencies before.
Upvotes: 0
Reputation: 27609
From your comment:
I just need to find out which file occurs the most.
So:
statistics.mode(example_list)
Upvotes: 1
Reputation: 95957
Your algorithm is unecessarily quadratic time. The following is linear
from collections import Counter
resl2 = [k for k,v in Counter(resl).items() if v > 500]
If you need them sorted, then do something like
resl2 = [(k,v) for k,v in Counter(resl).items() if v > 500]
resl2.sort(key=lambda kv: kv[1])
resl2 = [k for k,v in resl2]
Upvotes: 3
Reputation: 7744
Note that i
represents an element from the array and not an integer
for i in resl:
if resl.count(i) > 500:
resl2.append(i)
print(resl2)
Change it to this.
for i in range(0,len(resl)-1):
if i > 500:
resl2.append(resl[i])
print(resl2)
Upvotes: 0