kaushik
kaushik

Reputation: 5969

indexing for faster search of lists in a file?

I have a file with around 100k lists and have a another file with again a list of around an average of 50.

I want to compare 2nd item of list in second file with the 2nd element of 1st file and repeat this for each of the 50 lists in 2nd file and get the result of all the matching element.

I have written the code for all this,but this is taking a lot of time as it need to check the whole the 100k list some 50 times. I want to improve the speed.

I cant not post my code as it is part of big code and will be difficult to infer anything from that.

Upvotes: 2

Views: 158

Answers (1)

Alex Martelli
Alex Martelli

Reputation: 881565

You can afford to read all the "lakh" (hundred thousands) lines from the first file in memory once:

import collections
d = collections.defaultdict(list)

with open('lakhlists.txt') as f:
    for line in f:
        aslist = line.split()  # assuming whitespace separators
        d[aslist[1]].append(aslist)

you don't give us many crucial parameters but I'd bet this will fit in memory (for reasonable guesses at list lengths) on typical model platforms. Assuming this part works, just looping over the other, small files, and indexing into d should be trivial in comparison;-)

If you care to express your specs, and the relevant numbers, more precisely (and ideally in English), maybe more specific help can be offered!

Upvotes: 1

Related Questions