Reputation: 221
I am looking for a sample code which can convert .h5 files to csv or tsv. I have to read .h5 and output should be csv or tsv.
Sample code would be much appreciated,please help as i have stuck on it for last few days.I followed wrapper classes but don't know how to use that.I am not a good programmer so facing lot of problem.
please help thanks a lot in advance
Upvotes: 15
Views: 59588
Reputation: 1387
If you don't know the data structure of the h5 file you can examine it by finding the first data key often a single list that holds another list of keywords or the labels of the actual data.
This example uses an h5 file of LA traffic data from: https://drive.google.com/drive/folders/10FOTa6HXPqX8Pf5WRoRwcFnW9BrNZEIX
Reading and exploring the unknown h5 file by it's keys. Here the first key is df that wraps the other lists such as axis0 and axis1:
import pandas as pd
import h5py
#h5 file path
filename = 'metr-la.h5'
#read h5 file
dataset = h5py.File(filename, 'r')
#print the first unkown key in the h5 file
print(dataset.keys())
#print the keys inside the first unkown key
df = dataset['df']
print(df.keys()) #prints sub list keys such as axis0 and axis1
#print the attributes of keys such as axis0 inside the first unkown key
print("axis0 data: {}".format(df['axis0']))
print("axis0 data attributes: {}".format(list(df['axis0'].attrs)))
Save the entire h5 file to csv with pandas HDFStore using the first key df:
import pandas as pd
import h5py
#save the h5 file to csv using the first key df
with pd.HDFStore(filename, 'r') as d:
df = d.get('df')
df.to_csv('metr-la.csv')
You can also save parts of the data using the different sub keys.
Upvotes: 0
Reputation: 3393
Using pandas HDFStore
worked for me while read_hdf
did not:
import h5py
import pandas as pd
paths = []
with h5py.File('examples/test.h5','r') as hf:
hf.visit(paths.append)
dt = pd.HDFStore('examples/test.h5').get(paths[1])
dt.to_csv('test.csv')
Upvotes: 0
Reputation: 5645
Another python solution using pandas
.
#!/usr/bin/env python3
import pandas as pd
import sys
fpath = sys.argv[1]
if len(sys.argv)>2:
key = sys.argv[2]
df = pd.read_hdf(fpath, key=key)
else:
df = pd.read_hdf(fpath)
df.to_csv(sys.stdout, index=False)
This script is available here
First argument to this scrpt is hdf5 file. If second argument is passed, it is considered to be the name of column otherwise all columns are printed. It dumps the csv to stdout which you can redirect to a file.
For example, if your data is stored in hdf5 file called data.h5
and you have saved this script as hdf2df.py
then
$ python3 hdf2df.py data.hf > data.csv
will write the data to a csv file data.csv
.
Upvotes: 4
Reputation: 19
import numpy as np
import h5py
with h5py.File('chunk0003.hdf5','r') as hf:
print('List of arrays in this file: \n', hf.keys())
### This lists arrays in the file [u'_self_key', u'chrms1', u'chrms2', u'cuts1', u'cuts2', u'misc', u'strands1', u'strands2']
r1 = h5py.File('chunk0003.hdf5','r')
a = r1['chrms1'][:]
b = r1['chrms2'][:]
c = r1['cuts1'][:]
d = r1['cuts2'][:]
e = r1['strands1'][:]
f = r1['strands2'][:]
r1.close()
table=np.array([a,b,c,d,e,f])
table2=table.transpose()
np.savetxt('chunk0003.txt',table2,delimiter='\t')
Upvotes: 1
Reputation: 33
Example of HDF5 to CSV conversion can be found at https://github.com/amgreenstreet/Million-Song-Dataset-HDF5-to-CSV
It uses Python and converts Million Songs Dataset from HDF5 to CSV format.
I strongly recommend to use Python(x,y) version http://python-xy.github.io/ because this example uses additional Python packages like NumPy and PyTables. Python(x,y) has these packages included.
Upvotes: 1
Reputation: 6658
You can also use h5dump -o dset.asci -y -w 400 dset.h5
-o dset.asci
specifies the output file -y -w 400
specifies the dimension size multiplied by the number of positions and spaces needed to print each value. You should take a very large number here.dset.h5
is of course the hdf5 file you want to convertThis converts it to an ascii file, which is easy imported to excel, from where you can easily save it as a .csv
(save as within excel, and specify file format). I did it a couple of times, and it worked for me. source
Upvotes: 3
Reputation: 249153
Python:
import numpy as np
import h5py
np.savetxt(sys.stdout, h5py.File('foo.h5')['dataname'], '%g', ',')
Some notes:
"out.csv"
.'\t'
instead of ','
.dataname
).Upvotes: 0