Reputation: 463
I have an HDF file like that:
>>> dataset.store
... <class 'pandas.io.pytables.HDFStore'>
... File path: ../data/data_experiments_01-02-03.h5
... /exp01/user01 frame_table (typ->appendable,nrows->221,ncols->124,indexers->[index])
... /exp01/user02 frame_table (typ->appendable,nrows->163,ncols->124,indexers->[index])
... /exp01/user03 frame_table (typ->appendable,nrows->145,ncols->124,indexers->[index])
... /exp02/user01 frame_table (typ->appendable,nrows->194,ncols->124,indexers->[index])
... /exp02/user02 frame_table (typ->appendable,nrows->145,ncols->124,indexers->[index])
... /exp03/user03 frame_table (typ->appendable,nrows->348,ncols->124,indexers->[index])
... /exp03/user01 frame_table (typ->appendable,nrows->240,ncols->124,indexers->[index])
from which I want to retrieve all the users (userXY) from one of the experiments (exp0Z) and append them into a single big DataFrame. I have tried store.get('exp03')
obtaining the following error:
>>> store.get('exp03')
...
... ---------------------------------------------------------------------------
... TypeError Traceback (most recent call last)
... <ipython-input-109-0a2e29e9e0a4> in <module>()
... ----> 1 dataset.store.get('/exp03')
...
... /Library/Python/2.7/site-packages/pandas/io/pytables.pyc in get(self, key)
... 613 if group is None:
... 614 raise KeyError('No object named %s in the file' % key)
... --> 615 return self._read_group(group)
... 616
... 617 def select(self, key, where=None, start=None, stop=None, columns=None,
...
... /Library/Python/2.7/site-packages/pandas/io/pytables.pyc in _read_group(self, group, **kwargs)
... 1277
... 1278 def _read_group(self, group, **kwargs):
... -> 1279 s = self._create_storer(group)
... 1280 s.infer_axes()
... 1281 return s.read(**kwargs)
...
... /Library/Python/2.7/site-packages/pandas/io/pytables.pyc in _create_storer(self, group, format, value, append, **kwargs)
... 1160 else:
... 1161 raise TypeError(
... -> 1162 "cannot create a storer if the object is not existing "
... 1163 "nor a value are passed")
... 1164 else:
...
... TypeError: cannot create a storer if the object is not existing nor a value are passed
I can retrieve a single user by calling store.get('exp03/user01')
, so I guess it is possible to iterate the store.keys()
and append manually the retrieved dataframes, but I wonder if it is possible to do so in a single call to store.get()
or other similar method.
EDIT: Note that dataset is a class that contains my pandas.HDFstore
Upvotes: 2
Views: 4277
Reputation: 128918
This is not implemented, though could be a nice feature. (and FYI I would not have it set by default in .get(...)
because its not explicit enough (e.g. should it ALWAYS read ALL the tables, too much guessing), but could have an argument to control which sub-tables I suppose. If you are interested in implemented this, pls put to github.
You can use some internal functions to make this pretty easy though (and you could even pass a where
to each of the selects.
In [13]: store = pd.HDFStore('test.h5',mode='w')
In [14]: store.append('df/foo1',DataFrame(np.random.randn(10,2)))
In [15]: store.append('df/foo2',DataFrame(np.random.randn(10,2)))
In [16]: pd.concat([ store.select(node._v_pathname) for node in store.get_node('df') ])
Out[16]:
0 1
0 -0.495847 -1.449251
1 -0.494721 1.572560
2 1.219985 0.280878
3 -0.419651 1.975562
4 -0.489689 -2.712342
5 -0.022466 -0.238129
6 -1.195269 -0.028390
7 -0.192648 1.220730
8 1.331892 0.950508
9 -0.790354 -0.743006
0 -0.761820 0.847983
1 -0.126829 1.304889
2 0.667949 -1.481652
3 0.030162 -0.111911
4 -0.433762 -0.596412
5 -1.110968 0.411241
6 -0.428930 0.086527
7 -0.866701 -1.286884
8 -0.649420 0.227999
9 -0.100669 -0.205232
[20 rows x 2 columns]
In [17]: store.close()
Keep in mind though if I were doing this, their is little reason to have SEPARATE nodes when the data is the same; its MUCH more efficient to have it in a single table with say a field that indicates its name or id or whatever.
Almost always I use different nodes for heteregenous data (not necessary different dtypes, but different 'types' of data).
That said, you can organize however you like!
Upvotes: 5