ccsv
ccsv

Reputation: 8669

Python pandas Reading specific values from HDF5 files using read_hdf and HDFStore.select

So I created hdf5 file with a simple dataset that looks like this

>>> pd.read_hdf('STORAGE2.h5', 'table')
   A  B
0  0  0
1  1  1
2  2  2
3  3  3
4  4  4

Using this script

import pandas as pd
import scipy as sp
from pandas.io.pytables import Term

store = pd.HDFStore('STORAGE2.h5')

df_tl = pd.DataFrame(dict(A=list(range(5)), B=list(range(5))))

df_tl.to_hdf('STORAGE2.h5','table',append=True)

I know I can select columns using

x = pd.read_hdf('STORAGE2.h5', 'table',  columns=['A'])

or

x = store.select('table', where = 'columns=A')

How would I select all values in column 'A' that equals 3 or specific or indicies with strings in column 'A' like 'foo'? In pandas dataframes I would use df[df["A"]==3] or df[df["A"]=='foo']

Also does it make a difference in efficiency if I use read_hdf() or store.select()?

Upvotes: 2

Views: 6895

Answers (1)

Jeff
Jeff

Reputation: 129038

You need to specify data_columns= (you can use True as well to make all columns searchable)

(FYI, the mode='w' will start the file over, and is just for my example)

In [50]: df_tl.to_hdf('STORAGE2.h5','table',append=True,mode='w',data_columns=['A'])

In [51]: pd.read_hdf('STORAGE2.h5','table',where='A>2')
Out[51]: 
   A  B
3  3  3
4  4  4

Upvotes: 3

Related Questions