Jim Knoll
Jim Knoll

Reputation: 115

Pytables table into pandas DataFrame

Lots of information on how to read a csv into a pandas dataframe, but I what I have is a pyTable table and want a pandas DataFrame.

I've found how to store my pandas DataFrame to pytables... then read I want to read it back, at this point it will have:

"kind = v._v_attrs.pandas_type"  

I could write it out as csv and re-read it in but that seems silly. It is what I am doing for now.

How should I be reading pytable objects into pandas?

Upvotes: 6

Views: 10476

Answers (2)

Andy Hayden
Andy Hayden

Reputation: 375675

The docs now include an excellent section on using the HDF5 store and there are some more advanced strategies discussed in the cookbook.

It's now relatively straightforward:

In [1]: store = HDFStore('store.h5')

In [2]: print store
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
Empty

In [3]: df = DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [4]: store['df'] = df

In [5]: store
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df            frame        (shape->[2,2])

And to retrieve from HDF5/pytables:

In [6]: store['df']  # store.get('df') is an equivalent
Out[6]:
   A  B
0  1  2
1  3  4

You can also query within a table.

Upvotes: 5

meteore
meteore

Reputation: 4775

import tables as pt
import pandas as pd
import numpy as np

# the content is junk but we don't care
grades = np.empty((10,2), dtype=(('name', 'S20'), ('grade', 'u2')))

# write to a PyTables table
handle = pt.openFile('/tmp/test_pandas.h5', 'w')
handle.createTable('/', 'grades', grades)
print handle.root.grades[:].dtype # it is a structured array

# load back as a DataFrame and check types
df = pd.DataFrame.from_records(handle.root.grades[:])
df.dtypes

Beware that your u2 (unsigned 2-byte integer) will end as an i8 (integer 8 byte), and the strings will be objects, because Pandas does not yet support the full range of dtypes that are available for Numpy arrays.

Upvotes: 7

Related Questions