Hula Hula
Hula Hula

Reputation: 573

Lazy sorted function, timsort

Having a list with N (large number) elements:

from random import randint

eles = [randint(0, 10) for i in range(3000000)]

I'm trying to implement the best way (performance/resources spent) this function below:

def mosty(lst):
    sort = sorted((v, k) for k, v in enumerate(lst))
    count, maxi, last_ele, idxs = 0, 0, None, []
    for ele, idx in sort:
        if(last_ele != ele):
            count = 1
            idxs = []
        idxs.append(idx)
        if(last_ele == ele):
            count += 1
            if(maxi < count):
                results = (ele, count, idxs)
                maxi = count
        last_ele = ele
    return results

This function returns the most common element, number of occurrences, and the indexes where it was found.

Here is the benchmark with 300000 eles.

But I think I could improve, one of the reasons being python3 sorted function (timsort), if it returned a generator I didn't have to loop through the list twice right?

My questions are:

Is there any way for this code to be optimized? How?
With a lazy sorting I sure it would be, am I right? How can I implement lazy timsort

Upvotes: 0

Views: 148

Answers (2)

Deepak Saini
Deepak Saini

Reputation: 2910

If you are willing to use numpy, then you can do something like this:

arr = np.array(eles)
values, counts = np.unique(arr, return_counts=True)
ind = np.argmax(counts)
most_common_elem, its_count = values[ind], counts[ind]
indices = np.where(arr == most_common_elem)

HTH.

Upvotes: 2

hiro protagonist
hiro protagonist

Reputation: 46911

did not do any benchmarks, but that should not perform that badly (even though it iterates twice over the list):

from collections import Counter
from random import randint

eles = [randint(0, 10) for i in range(30)]

counter = Counter(eles)
most_common_element, number_of_occurrences = counter.most_common(1)[0]
indices = [i for i, x in enumerate(eles) if x == most_common_element]

print(most_common_element, number_of_occurrences, indices)

and the indices (the second iteration) can be found lazily in a generator expression:

indices = (i for i, x in enumerate(eles) if x == most_common_element)

if you need to care about multiple elements being the most common, this might work for you:

from collections import Counter
from itertools import groupby
from operator import itemgetter

eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]

counter = Counter(eles)
_key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
most_common = dict(group)
indices = {key: [] for key in most_common}

for i, x in enumerate(eles):
   if x in indices:
        indices[x].append(i)

print(most_common)
print(indices)

you could of course still make the indices lazy the same way as above.

Upvotes: 2

Related Questions