Selecting data from multiple tables in pytables

Question

How shoud I do this in the fastest way?

I have a .h5 file with some tables. Tables have like 10millions (or more) rows each one.

The whole file is around 10GB, (file does not fit in memory)

The tables are "linked", meaning that, all of them have the same column (ID), used as the column to link between them.

Now, if I call my tables: table1, table2 table3 table4, etc... I am looking for the fastest way perform a fast search in table2, whith the ID data from table1.

As an example, this is what I have done so far:

#search on the table1 and get ID's for the first condition
searchID= "".join(["(ID==%i)|"%j['ID'] for j in table1.where('some conditions for table1')])[:-1]

#search on table2 based on the ID's from table1
for row in table2.where(searchID):
    #do something with row

The problem is that I do not think that this is a very efficient solution. And, I have noticed that, if searchID grows a lot, Spyder just crashes.....

Selecting data from multiple tables in pytables

Answers (1)

Related Questions