Reputation: 805
I'm looking for the PyTables counterpart of a SQL query in the form of...
SELECT col2 FROM table WHERE col1 IN (val1, val2, val3...)
...where the condition values are stored in a Python list/tuple or numpy array of arbitrary length. The search values are determined during runtime of the application by the user.
I've tried an approach using PyTables Conditions related with in-kernel search:
options = {
'val' : 'someval1'
'val' : 'someval2'
}
for row in table.where('col1 == val', options):
print row['col2']
but it's only picking the first value.
My main goal is to build the options dictionary or list dynamically and pass it to Pytables.
Upvotes: 1
Views: 518
Reputation: 376
I think Anthony had a typo in the syntax for query. This should work:
In [1]: import numpy as np
In [2]: import tables as tb
In [3]: ra = np.fromiter(((i, i*2, i*3) for i in xrange(1000000)), dtype='i4,f4,f8')
In [4]: f = tb.open_file("mytable.h5", "w")
In [5]: t = f.create_table(f.root, "mytable", ra)
In [6]: cond = "|".join(["(f1 == %d)"%i for i in (2, 4, 10)])
In [7]: %timeit [r['f0'] for r in t.where(cond)]
100 loops, best of 3: 9.14 ms per loop
This speed is very close to my best attempt in numpy:
In [8]: %timeit np.where((ra['f1'] == 2)|(ra['f1'] == 4)|(ra['f1'] == 10))[0]
100 loops, best of 3: 8.32 ms per loop
And also, if you want to make use of the 'in' operator:
In [9]: %timeit [r['f0'] for r in t if r['f1'] in (2, 4, 10)]
10 loops, best of 3: 160 ms per loop
but in this case speed sucks.
Upvotes: 0
Reputation: 3637
This is because your options dictionary only has one element in it. The right way to do this is something like:
for row in table.where('col1 == val || col1 == val2'):
print row['col2']
Upvotes: 1