donopj2
donopj2

Reputation: 4123

Python Sort On The Fly

I am thinking about a problem I haven't encountered before and I'm trying to determine the most efficient algorithm to use.

I am iterating over two lists, using each pair of elements to calculate a value that I wish to sort on. My end goal is to obtain the top twenty results. I could store the results in a third list, sort that list by absolute value, and simply slice the top twenty, but that is not ideal.

Since these lists have the potential to become extremely large, I'd ideally like to only store the top twenty absolute values, evicting old values as a new top value is calculated.

What would be the most efficient way to implement this in python?

Upvotes: 7

Views: 1094

Answers (4)

ElmoVanKielmo
ElmoVanKielmo

Reputation: 11315

I know that the best answer is already chosen but for educational purposes you can consider mine as well.

I hope there is no typo:

def some_name(list_a, list_b):
    if len(list_a) != len(list_b):
        raise Exception("Too bad")
    result_list = []
    for result in (list_a[i] + list_b[i] for i in range(len(list_a))):
        if len(result_list) >= 20:
            if result_list[0] > result:
                continue
            result_list = result_list[1:]
        result_list.append(result)
        result_list.sort()

And a after some refactoring - it does almost what heapq.nlargest would do (ofcourse here we have to keep results sorted on our own):

def some_name(list_a, list_b):
    if len(list_a) != len(list_b):
        raise Exception("Too bad")
    result_list = []
    for result in (list_a[i] + list_b[i] for i in range(len(list_a))):
        result_list.append(result)
        result_list.sort()
        result_list = result_list[-20:]

Upvotes: 1

Jon Clements
Jon Clements

Reputation: 142226

You can use izip to iterate the two lists in parallel, and build a generator to lazily do a calculation over them, then heapq.nlargest to effectively keep the top n:

from itertools import izip
import heapq

list_a = [1, 2, 3]
list_b = [3, 4, 7]

vals = (abs(a - b) for a, b in izip(list_a, list_b))
print heapq.nlargest(2, vals)

Upvotes: 4

Steve Barnes
Steve Barnes

Reputation: 28405

Have a list of size 20 tupples initialised with less than the minimum result of the calculation and two indices of -1. On calculating a result append it to the results list, with the indices of the pair that resulted, sort on the value only and trim the list to length 20. Should be reasonably efficient as you only ever sort a list of length 21.

Upvotes: 1

arshajii
arshajii

Reputation: 129587

Take a look at heapq.nlargest:

heapq.nlargest(n, iterable[, key])

Return a list with the n largest elements from the dataset defined by iterable. key, if provided, specifies a function of one argument that is used to extract a comparison key from each element in the iterable: key=str.lower Equivalent to: sorted(iterable, key=key, reverse=True)[:n]

Upvotes: 11

Related Questions