rks
rks

Reputation: 213

Write a pandas data frame to HDF5

I'm processing large number of files in python and need to write the output (one dataframe for each input file) in HDF5 directly. I am wondering what is the best way to write pandas data frame from my script to HDF5 directly in a fast way? I am not sure if any python module like hdf5, hadoopy can do this. Any help in this regard will be appreciate.

Upvotes: 3

Views: 2947

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210812

It's difficult to give you a good answer to this rather generic question.

It's not clear how are you going to use (read) your HDF5 files - do you want to select data conditionally (using where parameter)?

fir of all you need to open a store object:

store = pd.HDFStore('/path/to/filename.h5')

now you can write (or append) to the store (i'm using here blosc compression - it's pretty fast and efficient), beside that i will use data_columns parameter in order to specify the columns that must be indexed (so you can use these columns in the where parameter later when you will read your HDF5 file):

for f in files:
    #read or process each file in/into a separate `df`
    store.append('df_identifier_AKA_key', df, data_columns=[list_of_indexed_cols], complevel=5, complib='blosc')

store.close()

Upvotes: 2

Related Questions