Reputation: 995
I'm currently working on moving some excel worksheets over to python automation, and have come across a speed issue.
I have a list of lists containing around 10.000 lists each with 20 or so columns. I also have a list of account numbers (100.000 numbers)
I wish to iterate over my list of lists and then pick out values from the list, if the account number in the lists matches one in the account list.
By running this code, i am able to get my desired result, however it is painfully slow.
calc = 0
for row in listOfLists:
if row[1] in Accounts:
calc += row[8]
Any ideas on how to optimize for speed?
Upvotes: 0
Views: 81
Reputation: 73450
Make Accounts
a set
. The contains check is O(1)
for sets, but O(N)
for lists. Your entire process' time complexity will be reduced from O(M*N)
to O(M+N)
. You can further use sum
with a conditional generator expression:
account_set = set(Accounts)
calc = sum(row[8] for row in listOfLists if row[1] in account_set)
Upvotes: 7
Reputation: 42678
A functional approach of this, using the set
over Accounts
as @schwobaseggle answered:
account_set = set(Accounts)
calc = reduce(lambda row: row[8] if row[1] in account_set else 0, listOfLists , 0)
Upvotes: 2