Reputation: 573
Having a list with N (large number) elements:
from random import randint
eles = [randint(0, 10) for i in range(3000000)]
I'm trying to implement the best way (performance/resources spent) this function below:
def mosty(lst):
sort = sorted((v, k) for k, v in enumerate(lst))
count, maxi, last_ele, idxs = 0, 0, None, []
for ele, idx in sort:
if(last_ele != ele):
count = 1
idxs = []
idxs.append(idx)
if(last_ele == ele):
count += 1
if(maxi < count):
results = (ele, count, idxs)
maxi = count
last_ele = ele
return results
This function returns the most common element, number of occurrences, and the indexes where it was found.
Here is the benchmark with 300000 eles.
But I think I could improve, one of the reasons being python3 sorted
function (timsort), if it returned a generator I didn't have to loop through the list twice right?
My questions are:
Is there any way for this code to be optimized? How?
With a lazy sorting I sure it would be, am I right? How can I implement lazy timsort
Upvotes: 0
Views: 148
Reputation: 2910
If you are willing to use numpy, then you can do something like this:
arr = np.array(eles)
values, counts = np.unique(arr, return_counts=True)
ind = np.argmax(counts)
most_common_elem, its_count = values[ind], counts[ind]
indices = np.where(arr == most_common_elem)
HTH.
Upvotes: 2
Reputation: 46911
did not do any benchmarks, but that should not perform that badly (even though it iterates twice over the list):
from collections import Counter
from random import randint
eles = [randint(0, 10) for i in range(30)]
counter = Counter(eles)
most_common_element, number_of_occurrences = counter.most_common(1)[0]
indices = [i for i, x in enumerate(eles) if x == most_common_element]
print(most_common_element, number_of_occurrences, indices)
and the indices (the second iteration) can be found lazily in a generator expression:
indices = (i for i, x in enumerate(eles) if x == most_common_element)
if you need to care about multiple elements being the most common, this might work for you:
from collections import Counter
from itertools import groupby
from operator import itemgetter
eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]
counter = Counter(eles)
_key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
most_common = dict(group)
indices = {key: [] for key in most_common}
for i, x in enumerate(eles):
if x in indices:
indices[x].append(i)
print(most_common)
print(indices)
you could of course still make the indices
lazy the same way as above.
Upvotes: 2