Reputation: 8669
So I created hdf5 file with a simple dataset that looks like this
>>> pd.read_hdf('STORAGE2.h5', 'table')
A B
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
Using this script
import pandas as pd
import scipy as sp
from pandas.io.pytables import Term
store = pd.HDFStore('STORAGE2.h5')
df_tl = pd.DataFrame(dict(A=list(range(5)), B=list(range(5))))
df_tl.to_hdf('STORAGE2.h5','table',append=True)
I know I can select columns using
x = pd.read_hdf('STORAGE2.h5', 'table', columns=['A'])
or
x = store.select('table', where = 'columns=A')
How would I select all values in column 'A' that equals 3 or specific or indicies with strings in column 'A' like 'foo'? In pandas dataframes I would use df[df["A"]==3]
or df[df["A"]=='foo']
Also does it make a difference in efficiency if I use read_hdf()
or store.select()
?
Upvotes: 2
Views: 6895
Reputation: 129038
You need to specify data_columns=
(you can use True
as well to make all columns searchable)
(FYI, the mode='w'
will start the file over, and is just for my example)
In [50]: df_tl.to_hdf('STORAGE2.h5','table',append=True,mode='w',data_columns=['A'])
In [51]: pd.read_hdf('STORAGE2.h5','table',where='A>2')
Out[51]:
A B
3 3 3
4 4 4
Upvotes: 3