Michael
Michael

Reputation: 1963

Save a pandas DataFrame in a group of h5py for later use

I want to append a pandas DataFrame object to an existing h5py file, whether as a subgroup or dataset, with all the index and header information. Is that possible? I tried the following:

import pandas as pd
import h5py
f = h5py.File('f.h5', 'r+')
df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['A', 'B', 'C'], index=['X', 'Y'])
f['df'] = df

From another script, I would like to access f.h5, but the output of f['df'][()] is array([[1, 2, 3],[4, 5, 6]]), which doesn't contain the header information.

Upvotes: 1

Views: 2222

Answers (1)

rwhitt2049
rwhitt2049

Reputation: 380

You can write to an existing hdf5 file directly with Pandas via pd.DataFrame.to_hdf() and read it back in with pd.read_hdf(). You just have to make sure to read and write with the same key.

To write to the h5 file:

existing_hdf5 = "f.h5"
df = pd.DataFrame([[1,2,3],[4,5,6]], 
                  columns=['A', 'B', 'C'], index=['X', 'Y'])

df.to_hdf(existing_hdf5 , key='df')

Then you can read by:

df2 = pd.read_hdf(existing_hdf5 , key='df')
print(df2)

   A  B  C
X  1  2  3
Y  4  5  6

Note that you can also make the dataframe appendable using format="table" which requires the option dependency of Pytables

Upvotes: 2

Related Questions