Reputation: 4123
I am thinking about a problem I haven't encountered before and I'm trying to determine the most efficient algorithm to use.
I am iterating over two lists, using each pair of elements to calculate a value that I wish to sort on. My end goal is to obtain the top twenty results. I could store the results in a third list, sort that list by absolute value, and simply slice the top twenty, but that is not ideal.
Since these lists have the potential to become extremely large, I'd ideally like to only store the top twenty absolute values, evicting old values as a new top value is calculated.
What would be the most efficient way to implement this in python?
Upvotes: 7
Views: 1094
Reputation: 11315
I know that the best answer is already chosen but for educational purposes you can consider mine as well.
I hope there is no typo:
def some_name(list_a, list_b):
if len(list_a) != len(list_b):
raise Exception("Too bad")
result_list = []
for result in (list_a[i] + list_b[i] for i in range(len(list_a))):
if len(result_list) >= 20:
if result_list[0] > result:
continue
result_list = result_list[1:]
result_list.append(result)
result_list.sort()
And a after some refactoring - it does almost what heapq.nlargest
would do (ofcourse here we have to keep results sorted on our own):
def some_name(list_a, list_b):
if len(list_a) != len(list_b):
raise Exception("Too bad")
result_list = []
for result in (list_a[i] + list_b[i] for i in range(len(list_a))):
result_list.append(result)
result_list.sort()
result_list = result_list[-20:]
Upvotes: 1
Reputation: 142226
You can use izip
to iterate the two lists in parallel, and build a generator to lazily do a calculation over them, then heapq.nlargest
to effectively keep the top n
:
from itertools import izip
import heapq
list_a = [1, 2, 3]
list_b = [3, 4, 7]
vals = (abs(a - b) for a, b in izip(list_a, list_b))
print heapq.nlargest(2, vals)
Upvotes: 4
Reputation: 28405
Have a list of size 20 tupples initialised with less than the minimum result of the calculation and two indices of -1. On calculating a result append it to the results list, with the indices of the pair that resulted, sort on the value only and trim the list to length 20. Should be reasonably efficient as you only ever sort a list of length 21.
Upvotes: 1
Reputation: 129587
Take a look at heapq.nlargest
:
heapq.nlargest(n, iterable[, key])
Return a list with the n largest elements from the dataset defined by iterable. key, if provided, specifies a function of one argument that is used to extract a comparison key from each element in the iterable:
key=str.lower
Equivalent to:sorted(iterable, key=key, reverse=True)[:n]
Upvotes: 11