Sameer Damir
Sameer Damir

Reputation: 1644

How to read HDF5 files in Python

I am trying to read data from hdf5 file in Python. I can read the hdf5 file using h5py, but I cannot figure out how to access data within the file.

My code

import h5py    
import numpy as np    
f1 = h5py.File(file_name,'r+')    

This works and the file is read. But how can I access data inside the file object f1?

Upvotes: 151

Views: 551150

Answers (13)

Maicon Mauricio
Maicon Mauricio

Reputation: 3041

Reading

Use the visititems function from h5py. The callback function is called through all the hierarchy: groups and datasets.

import h5py

# Open the HDF5 file in read mode
file_path = 'your_file.h5'

with h5py.File(file_path, 'r') as file:
    # Function to recursively print the HDF5 dataset hierarchy
    def print_hdf5_item(name, obj):
        # name is in path format like /group1/group2/dataset
        if isinstance(obj, h5py.Group):
            # Do something like creating a dictionary entry
            print(f'Group: {name}')
        elif isinstance(obj, h5py.Dataset):
            # Do something with obj like converting to a pandas.Series 
            # and storing to a dictionary entry
            print(f'Dataset: {name}')

    # Visit all items in the HDF5 file and print their names
    file.visititems(print_hdf5_item)

or use pandas.read_hdf:

import pandas as pd
df = pd.read_hdf('./store.h5')

Notice that your data might not map directly to a DataFrame. The former option is more flexible.


Writing

If using Pandas, you can use pandas.DataFrame.to_hdf:

# df is a DataFrame object
df.to_hdf('database.h5', 'group/subgroup', table=True, mode='a')

Upvotes: 2

Syrtis Major
Syrtis Major

Reputation: 3949

I recommend a wrapper of h5py, H5Attr, that allows you to load hdf5 data easily via attributes such as group.dataset (equivalent to the original group['dataset']) with IPython/Jupyter tab completion.

The code is here. Here are some use examples, you can try the code below yourself

# create example HDF5 file for this guide
import h5py, io
file = io.BytesIO()
with h5py.File(file, 'w') as fp:
    fp['0'] = [1, 2]
    fp['a'] = [3, 4]
    fp['b/c'] = 5
    fp.attrs['d'] = 's'

# import package
from h5attr import H5Attr

# open file
f = H5Attr(file)

# easy access to members, with tab completion in IPython/Jupyter
f.a, f['a']

# also work for subgroups, but note that f['b/c'] is more efficient
# because it does not create f['b']
f.b.c, f['b'].c, f['b/c']

# access to HDF5 attrs via a H5Attr wrapper
f._attrs.d, f._attrs['d']

# show summary of the data
f._show()
# 0   int64 (2,)
# a   int64 (2,)
# b/  1 members

# lazy (default) and non-lazy mode
f = H5Attr(file)
f.a  # <HDF5 dataset "a": shape (2,), type "<i8">

f = H5Attr(file, lazy=False)
f.a  # array([3, 4])

Upvotes: 0

ashish bansal
ashish bansal

Reputation: 311

Use below code to data read and convert into numpy array

import h5py
f1 = h5py.File('data_1.h5', 'r')
list(f1.keys())
X1 = f1['x']
y1=f1['y']
df1= np.array(X1.value)
dfy1= np.array(y1.value)
print (df1.shape)
print (dfy1.shape)

Preferred method to read dataset values into a numpy array:

import h5py
# use Python file context manager:
with h5py.File('data_1.h5', 'r') as f1:
    print(list(f1.keys()))  # print list of root level objects
    # following assumes 'x' and 'y' are dataset objects
    ds_x1 = f1['x']  # returns h5py dataset object for 'x'
    ds_y1 = f1['y']  # returns h5py dataset object for 'y'
    arr_x1 = f1['x'][()]  # returns np.array for 'x'
    arr_y1 = f1['y'][()]  # returns np.array for 'y'
    arr_x1 = ds_x1[()]  # uses dataset object to get np.array for 'x'
    arr_y1 = ds_y1[()]  # uses dataset object to get np.array for 'y'
    print (arr_x1.shape)
    print (arr_y1.shape)

Upvotes: 6

Martin Thoma
Martin Thoma

Reputation: 136665

Read HDF5

import h5py
filename = "file.hdf5"

with h5py.File(filename, "r") as f:
    # Print all root level object names (aka keys) 
    # these can be group or dataset names 
    print("Keys: %s" % f.keys())
    # get first object name/key; may or may NOT be a group
    a_group_key = list(f.keys())[0]

    # get the object type for a_group_key: usually group or dataset
    print(type(f[a_group_key])) 

    # If a_group_key is a group name, 
    # this gets the object names in the group and returns as a list
    data = list(f[a_group_key])

    # If a_group_key is a dataset name, 
    # this gets the dataset values and returns as a list
    data = list(f[a_group_key])
    # preferred methods to get dataset values:
    ds_obj = f[a_group_key]      # returns as a h5py dataset object
    ds_arr = f[a_group_key][()]  # returns as a numpy array

Write HDF5

import h5py

# Create random data
import numpy as np
data_matrix = np.random.uniform(-1, 1, size=(10, 3))

# Write data to HDF5
with h5py.File("file.hdf5", "w") as data_file:
    data_file.create_dataset("dataset_name", data=data_matrix)

See h5py docs for more information.

Alternatives

For your application, the following might be important:

  • Support by other programming languages
  • Reading / writing performance
  • Compactness (file size)

See also: Comparison of data serialization formats

In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python

Upvotes: 235

Daksh
Daksh

Reputation: 1154

Reading the file

import h5py

f = h5py.File(file_name, mode)

Studying the structure of the file by printing what HDF5 groups are present

for key in f.keys():
    print(key) #Names of the root level object names in HDF5 file - can be groups or datasets.
    print(type(f[key])) # get the object type: usually group or dataset

Extracting the data

#Get the HDF5 group; key needs to be a group name from above
group = f[key]

#Checkout what keys are inside that group.
for key in group.keys():
    print(key)

# This assumes group[some_key_inside_the_group] is a dataset, 
# and returns a np.array:
data = group[some_key_inside_the_group][()]
#Do whatever you want with data

#After you are done
f.close()

Upvotes: 46

Zaeem Asghar
Zaeem Asghar

Reputation: 1

use this it works fine for me


    weights = {}

    keys = []
    with h5py.File("path.h5", 'r') as f: 
        f.visit(keys.append) 
        for key in keys:
            if ':' in key: 
                print(f[key].name)     
                weights[f[key].name] = f[key][()]
    return weights

print(read_hdf5())

if you are using the h5py<='2.9.0' then you can use


    weights = {}

    keys = []
    with h5py.File("path.h5", 'r') as f: 
        f.visit(keys.append) 
        for key in keys:
            if ':' in key: 
                print(f[key].name)     
                weights[f[key].name] = f[key].value
    return weights

print(read_hdf5())

Upvotes: 0

Machzx
Machzx

Reputation: 49

If you have named datasets in the hdf file then you can use the following code to read and convert these datasets in numpy arrays:

import h5py
file = h5py.File('filename.h5', 'r')

xdata = file.get('xdata')
xdata= np.array(xdata)

If your file is in a different directory you can add the path in front of'filename.h5'.

Upvotes: 2

Patol75
Patol75

Reputation: 4547

Using bits of answers from this question and the latest doc, I was able to extract my numerical arrays using

import h5py
with h5py.File(filename, 'r') as h5f:
    h5x = h5f[list(h5f.keys())[0]]['x'][()]

Where 'x' is simply the X coordinate in my case.

Upvotes: 0

Judice
Judice

Reputation: 19

from keras.models import load_model 

h= load_model('FILE_NAME.h5')

Upvotes: 1

Attila
Attila

Reputation: 321

Here's a simple function I just wrote which reads a .hdf5 file generated by the save_weights function in keras and returns a dict with layer names and weights:

def read_hdf5(path):

    weights = {}

    keys = []
    with h5py.File(path, 'r') as f: # open file
        f.visit(keys.append) # append all keys to list
        for key in keys:
            if ':' in key: # contains data if ':' in key
                print(f[key].name)
                weights[f[key].name] = f[key].value
    return weights

https://gist.github.com/Attila94/fb917e03b04035f3737cc8860d9e9f9b.

Haven't tested it thoroughly but does the job for me.

Upvotes: 10

Raza
Raza

Reputation: 189

To read the content of .hdf5 file as an array, you can do something as follow

> import numpy as np 
> myarray = np.fromfile('file.hdf5', dtype=float)
> print(myarray)

Upvotes: 7

Nafiul Islam
Nafiul Islam

Reputation: 82560

What you need to do is create a dataset. If you take a look at the quickstart guide, it shows you that you need to use the file object in order to create a dataset. So, f.create_dataset and then you can read the data. This is explained in the docs.

Upvotes: 0

Danny
Danny

Reputation: 481

you can use Pandas.

import pandas as pd
pd.read_hdf(filename,key)

Upvotes: 31

Related Questions