Reputation: 179
I need to query an HDF5
file with where
clause with 3 conditions, one of the condition is a list with a length of 30:
myList = list(xrange(30))
h5DF = pd.read_hdf(h5Filename, 'df', where='index=myList & date=dateString & time=timeString')
The query above gives me ValueError: too many inputs
and the error is reproducible.
If I reduce length of the list to 29 (three conditions):
myList = list(xrange(29))
h5DF = pd.read_hdf(h5Filename, 'df', where='index=myList & date=dateString & time=timeString')
OR number of conditions to only two (list length of 30):
then it executes fine:
myList = list(xrange(30))
h5DF = pd.read_hdf(h5Filename, 'df', where='index=myList & time=timeString')
Is this a known limitation? pandas documentation at http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.pytables.read_hdf.html doesn't mention about this limitation and seems like after searching this forum nobody encounter this limitation yet.
Version is pandas 0.15.2
. Any help is appreciated.
Upvotes: 2
Views: 5736
Reputation: 129018
This is answered here
This is a defect in that numpy/numexpr
cannot handle more than 31 operands in the tree. An expression like foo=[1,2,3,4]
in the where of the HDFStore
generates an expression like (foo==1) | (foo==2) ....
so these are expanded and if you have too many can fail.
HDFStore
handles this with a single operand (IOW if you just have foo=[range(31)]
is ok, but because you happen to have a nested sub-expression where the sub-nodes themselves are too long it errors.
Generally a better way to do this is to select a bigger range (e.g. maybe the end-points of the selection for each operand), then do an in-memory .isin
. It might even be faster, because HDF5 tends to be more efficient IMHO when selecting larger ranges (even though you are bringing more data to memory), rather than individual selections.
Upvotes: 5