python select sublist effectively

Question

I have a long (many Millions) list of Transactions in the form [account_id, transactiontyp, data], each account has many Transactions. I want to select all transactions for a given short list (approx. 20000) of accounts. Example:

longlist=[['a','t1',5],['a','t1',9],['b','t1',3],['c','t5',8]]

shortlist=['a','c']

The following code does the trick, but is extremly slow:

selection=[sel for sel in longlist if sel[0] in shortlist]

There must be faster ways to accomplish that? I tried

def select_sample(longlist,shortlist):
    ret=[]
    for elem in longlist:
        if elem[0] in shortlist:
            ret.append(elem)
    return ret

to see how it scales, as expected it is linear in the size of longlist. Longlist is sorted by account_id. I do not have a unique key in longlist to use dictionaries. Is there something like an index-technique, which I could use?

Antti Haapala -- Слава Україні · Accepted Answer

Using set (shortlist = set(['a', 'c']) would have a break-even at around 20 accounts. I'd expect it to be at least 2 decades faster if shortlist has 20k accounts.

If this selection is repeated many times, you'd benefit even more by using a dictionary that maps account name to list of transactions for that specific account.

However, all in all this smells like an XY problem. Surely you should use a RDBMS to manage this data? They have got efficient algorithms to handle exactly these kinds of queries.

python select sublist effectively

Answers (1)

Related Questions