Mario
Mario

Reputation: 1976

How can extract data from .h5 file and save it in .txt or .csv properly?

After searching a lot I couldn't find a simple way to extract data from .h5 and pass it to a data.Frame by Numpy or Pandas in order to save in .txt or .csv file.

import h5py
import numpy as np
import pandas as pd

filename = 'D:\data.h5'
f = h5py.File(filename, 'r')

# List all groups
print("Keys: %s" % f.keys())
a_group_key = list(f.keys())[0]

# Get the data
data = list(f[a_group_key])
pd.DataFrame(data).to_csv("hi.csv")
Keys: <KeysViewHDF5 ['dd48']>

When I print data I see following results:

print(data)
['axis0',
 'axis1',
 'block0_items',
 'block0_values',
 'block1_items',
 'block1_values']

I would appreciate the if someone explain me what are they and how I can extract data completely and save it in .csv file. It seems there hasn't been a routine way to do that and it's kind of challenging yet! Until now I just could see part of data via:

import numpy as np 
dfm = np.fromfile('D:\data.h5', dtype=float)
print (dfm.shape)
print(dfm[5:])

dfm=pd.to_csv('train.csv')
#dfm.to_csv('hi.csv', sep=',', header=None, index=None)

My expectation is to extract time_stamps and measurements in .h5 file.

Upvotes: 2

Views: 13596

Answers (2)

kcw78
kcw78

Reputation: 8046

h5py will access HDF5 datasets as numpy arrays. Your call to get the keys returns a LIST of the dataset names. Now that you have them, it should be pretty simple to access them as a numpy array and write them. You need to get the dtype to know what is in each column to format correctly.

Updated 5/22/2019 to reflect content of data.h5 posted at link in comment. Default format in np.savetxt() is '%.18e'. Very simple (crude) logic provided to modify format based on dtype for these datasets. This requires more robust dtype checking and formatting for general use. Also, you will need to add logic to decode unicode strings.

import h5py
filename = 'D:\data.h5'
import numpy as np
h5f = h5py.File(filename, 'r')
# get a List of data sets in group 'dd48'
a_dset_keys = list(h5f['dd48'].keys())

# Get the data
for dset in a_dset_keys :
    ds_data = (h5f['dd48'][dset])
    print ('dataset=', dset)
    print (ds_data.dtype)
    if ds_data.dtype == 'float64' :
        csvfmt = '%.18e'
    elif ds_data.dtype == 'int64' :
        csvfmt = '%.10d'
    else:
        csvfmt = '%s'
    np.savetxt('output_'+dset+'.csv', ds_data, fmt=csvfmt, delimiter=',')

Upvotes: 0

John Zwinck
John Zwinck

Reputation: 249542

It looks like that data was written by Pandas, so use pd.read_hdf() to read it.

Upvotes: 0

Related Questions