Reputation: 13930
I got all my data into a HDFStore (yeah!), but how to get it out of it..
I've saved 6 DataFrames as frame_table in my HDFStore. Each of these table looks like the following, but the length varies (date is Julian date).
>>> a = store.select('var1')
>>> a.head()
var1
x_coor y_coor date
928 310 2006257 133
932 400 2006257 236
939 311 2006257 253
941 312 2006257 152
942 283 2006257 68
Then I select from all my tables the values where the date is e.g > 2006256.
>>> b = store.select_as_multiple(['var1','var2','var3','var4','var5','var6'], where=(pd.Term('date','>',date)), selector= 'var1')
>>> b.head()
var1 var2 var3 var4 var5 var6
x_coor y_coor date
928 310 2006257 133 14987 7045 18 240 171
2006273 136 0 7327 30 253 161
2006289 125 0 -239 83 217 168
2006305 95 14604 6786 13 215 57
2006321 84 0 4548 13 133 88
This works, but only for the relatively small .h5 files. So for my normal .h5 files I would like to temporarily store it in a HDFStore using chunksize (since I've to add a new column based on this selection to it as well). I thought like this (using this):
for df in store.select_as_multiple(['var1','var2','var3','var4','var5','var6'], where=(pd.Term('date','>',date)), selector= 'var1', chunksize=15):
tempstore.put('test',pd.DataFrame(df))
But then only one chunk is added to the store. But with:
tempstore.append('test',pd.DataFrame(df))
I get ValueError: Can only append to Tables. What I'm doing wrong?
Upvotes: 2
Views: 1750
Reputation: 129068
When you tried to do this with put
it kept overwriting the store (with the latest chunk), then you get the error when you append (because you can't append to a storer / non-table).
That is:
put
writes a single, non-appendable fixed format (called a storer
), which is fast to write, but you cannot append, nor query (only get it in its entirety).
append
creates a table
format, which is what you want here (and what a frame_table
is).
Note: you don't need to do pd.DataFrame(df)
as df
is already a frame.
So, first do this (delete the store) if its there:
if 'test' in tempstore:
tempstore.remove('test')
Then append each DataFrame:
for df in store.select_as_multiple(.....):
tempstore.append('test', df)
Upvotes: 5