Sanjay Tiwari
Sanjay Tiwari

Reputation: 221

Converting hdf5 to csv or tsv files

I am looking for a sample code which can convert .h5 files to csv or tsv. I have to read .h5 and output should be csv or tsv.

Sample code would be much appreciated,please help as i have stuck on it for last few days.I followed wrapper classes but don't know how to use that.I am not a good programmer so facing lot of problem.

please help thanks a lot in advance

Upvotes: 15

Views: 59588

Answers (7)

ThomasAFink
ThomasAFink

Reputation: 1387

If you don't know the data structure of the h5 file you can examine it by finding the first data key often a single list that holds another list of keywords or the labels of the actual data.

This example uses an h5 file of LA traffic data from: https://drive.google.com/drive/folders/10FOTa6HXPqX8Pf5WRoRwcFnW9BrNZEIX

Reading and exploring the unknown h5 file by it's keys. Here the first key is df that wraps the other lists such as axis0 and axis1:

import pandas as pd
import h5py

#h5 file path
filename = 'metr-la.h5'

#read h5 file
dataset = h5py.File(filename, 'r')

#print the first unkown key in the h5 file
print(dataset.keys())

#print the keys inside the first unkown key
df = dataset['df']
print(df.keys()) #prints sub list keys such as axis0 and axis1

#print the attributes of keys such as axis0 inside the first unkown key
print("axis0 data: {}".format(df['axis0']))
print("axis0 data attributes: {}".format(list(df['axis0'].attrs)))

Save the entire h5 file to csv with pandas HDFStore using the first key df:

import pandas as pd
import h5py

#save the h5 file to csv using the first key df
with pd.HDFStore(filename, 'r') as d:
    df = d.get('df')
    df.to_csv('metr-la.csv')

You can also save parts of the data using the different sub keys.

Upvotes: 0

jsta
jsta

Reputation: 3393

Using pandas HDFStore worked for me while read_hdf did not:

import h5py
import pandas as pd 

paths = []
with h5py.File('examples/test.h5','r') as hf:
    hf.visit(paths.append)
dt = pd.HDFStore('examples/test.h5').get(paths[1])
dt.to_csv('test.csv')

Upvotes: 0

Dilawar
Dilawar

Reputation: 5645

Another python solution using pandas.

#!/usr/bin/env python3

import pandas as pd
import sys
fpath = sys.argv[1]
if len(sys.argv)>2:
    key = sys.argv[2]
    df = pd.read_hdf(fpath, key=key)
else:
    df = pd.read_hdf(fpath)

df.to_csv(sys.stdout, index=False)

This script is available here

First argument to this scrpt is hdf5 file. If second argument is passed, it is considered to be the name of column otherwise all columns are printed. It dumps the csv to stdout which you can redirect to a file.

For example, if your data is stored in hdf5 file called data.h5 and you have saved this script as hdf2df.py then

$ python3 hdf2df.py data.hf > data.csv

will write the data to a csv file data.csv.

Upvotes: 4

Dhamma Bharne
Dhamma Bharne

Reputation: 19

import numpy as np
import h5py

with h5py.File('chunk0003.hdf5','r') as hf:
    print('List of arrays in this file: \n', hf.keys())
### This lists arrays in the file [u'_self_key', u'chrms1', u'chrms2', u'cuts1', u'cuts2', u'misc', u'strands1', u'strands2']

r1 = h5py.File('chunk0003.hdf5','r')
a = r1['chrms1'][:]
b = r1['chrms2'][:]
c = r1['cuts1'][:]
d = r1['cuts2'][:]
e = r1['strands1'][:]
f = r1['strands2'][:]
r1.close()
table=np.array([a,b,c,d,e,f])
table2=table.transpose()
np.savetxt('chunk0003.txt',table2,delimiter='\t')

Upvotes: 1

SmallerThan
SmallerThan

Reputation: 33

Example of HDF5 to CSV conversion can be found at https://github.com/amgreenstreet/Million-Song-Dataset-HDF5-to-CSV

It uses Python and converts Million Songs Dataset from HDF5 to CSV format.

I strongly recommend to use Python(x,y) version http://python-xy.github.io/ because this example uses additional Python packages like NumPy and PyTables. Python(x,y) has these packages included.

Upvotes: 1

Mathias711
Mathias711

Reputation: 6658

You can also use h5dump -o dset.asci -y -w 400 dset.h5

  • -o dset.asci specifies the output file
  • -y -w 400 specifies the dimension size multiplied by the number of positions and spaces needed to print each value. You should take a very large number here.
  • dset.h5 is of course the hdf5 file you want to convert

This converts it to an ascii file, which is easy imported to excel, from where you can easily save it as a .csv (save as within excel, and specify file format). I did it a couple of times, and it worked for me. source

Upvotes: 3

John Zwinck
John Zwinck

Reputation: 249153

Python:

import numpy as np
import h5py
np.savetxt(sys.stdout, h5py.File('foo.h5')['dataname'], '%g', ',')

Some notes:

  1. sys.stdout can be any file, or a file name string like "out.csv".
  2. %g is used to make the formatting human-friendly.
  3. If you want TSV just use '\t' instead of ','.
  4. I've assumed you have a single dataset name within the file (dataname).

Upvotes: 0

Related Questions