Reputation: 1644
I am trying to read data from hdf5 file in Python. I can read the hdf5 file using h5py
, but I cannot figure out how to access data within the file.
import h5py
import numpy as np
f1 = h5py.File(file_name,'r+')
This works and the file is read. But how can I access data inside the file object f1
?
Upvotes: 151
Views: 551150
Reputation: 3041
Use the visititems
function from h5py
. The callback function is called through all the hierarchy: groups and datasets.
import h5py
# Open the HDF5 file in read mode
file_path = 'your_file.h5'
with h5py.File(file_path, 'r') as file:
# Function to recursively print the HDF5 dataset hierarchy
def print_hdf5_item(name, obj):
# name is in path format like /group1/group2/dataset
if isinstance(obj, h5py.Group):
# Do something like creating a dictionary entry
print(f'Group: {name}')
elif isinstance(obj, h5py.Dataset):
# Do something with obj like converting to a pandas.Series
# and storing to a dictionary entry
print(f'Dataset: {name}')
# Visit all items in the HDF5 file and print their names
file.visititems(print_hdf5_item)
or use pandas.read_hdf
:
import pandas as pd
df = pd.read_hdf('./store.h5')
Notice that your data might not map directly to a DataFrame. The former option is more flexible.
If using Pandas, you can use pandas.DataFrame.to_hdf
:
# df is a DataFrame object
df.to_hdf('database.h5', 'group/subgroup', table=True, mode='a')
Upvotes: 2
Reputation: 3949
I recommend a wrapper of h5py, H5Attr
, that allows you to load hdf5 data easily via attributes such as group.dataset
(equivalent to the original group['dataset']
) with IPython/Jupyter tab completion.
The code is here. Here are some use examples, you can try the code below yourself
# create example HDF5 file for this guide
import h5py, io
file = io.BytesIO()
with h5py.File(file, 'w') as fp:
fp['0'] = [1, 2]
fp['a'] = [3, 4]
fp['b/c'] = 5
fp.attrs['d'] = 's'
# import package
from h5attr import H5Attr
# open file
f = H5Attr(file)
# easy access to members, with tab completion in IPython/Jupyter
f.a, f['a']
# also work for subgroups, but note that f['b/c'] is more efficient
# because it does not create f['b']
f.b.c, f['b'].c, f['b/c']
# access to HDF5 attrs via a H5Attr wrapper
f._attrs.d, f._attrs['d']
# show summary of the data
f._show()
# 0 int64 (2,)
# a int64 (2,)
# b/ 1 members
# lazy (default) and non-lazy mode
f = H5Attr(file)
f.a # <HDF5 dataset "a": shape (2,), type "<i8">
f = H5Attr(file, lazy=False)
f.a # array([3, 4])
Upvotes: 0
Reputation: 311
Use below code to data read and convert into numpy array
import h5py
f1 = h5py.File('data_1.h5', 'r')
list(f1.keys())
X1 = f1['x']
y1=f1['y']
df1= np.array(X1.value)
dfy1= np.array(y1.value)
print (df1.shape)
print (dfy1.shape)
Preferred method to read dataset values into a numpy array:
import h5py
# use Python file context manager:
with h5py.File('data_1.h5', 'r') as f1:
print(list(f1.keys())) # print list of root level objects
# following assumes 'x' and 'y' are dataset objects
ds_x1 = f1['x'] # returns h5py dataset object for 'x'
ds_y1 = f1['y'] # returns h5py dataset object for 'y'
arr_x1 = f1['x'][()] # returns np.array for 'x'
arr_y1 = f1['y'][()] # returns np.array for 'y'
arr_x1 = ds_x1[()] # uses dataset object to get np.array for 'x'
arr_y1 = ds_y1[()] # uses dataset object to get np.array for 'y'
print (arr_x1.shape)
print (arr_y1.shape)
Upvotes: 6
Reputation: 136665
import h5py
filename = "file.hdf5"
with h5py.File(filename, "r") as f:
# Print all root level object names (aka keys)
# these can be group or dataset names
print("Keys: %s" % f.keys())
# get first object name/key; may or may NOT be a group
a_group_key = list(f.keys())[0]
# get the object type for a_group_key: usually group or dataset
print(type(f[a_group_key]))
# If a_group_key is a group name,
# this gets the object names in the group and returns as a list
data = list(f[a_group_key])
# If a_group_key is a dataset name,
# this gets the dataset values and returns as a list
data = list(f[a_group_key])
# preferred methods to get dataset values:
ds_obj = f[a_group_key] # returns as a h5py dataset object
ds_arr = f[a_group_key][()] # returns as a numpy array
import h5py
# Create random data
import numpy as np
data_matrix = np.random.uniform(-1, 1, size=(10, 3))
# Write data to HDF5
with h5py.File("file.hdf5", "w") as data_file:
data_file.create_dataset("dataset_name", data=data_matrix)
See h5py docs for more information.
For your application, the following might be important:
See also: Comparison of data serialization formats
In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python
Upvotes: 235
Reputation: 1154
Reading the file
import h5py
f = h5py.File(file_name, mode)
Studying the structure of the file by printing what HDF5 groups are present
for key in f.keys():
print(key) #Names of the root level object names in HDF5 file - can be groups or datasets.
print(type(f[key])) # get the object type: usually group or dataset
Extracting the data
#Get the HDF5 group; key needs to be a group name from above
group = f[key]
#Checkout what keys are inside that group.
for key in group.keys():
print(key)
# This assumes group[some_key_inside_the_group] is a dataset,
# and returns a np.array:
data = group[some_key_inside_the_group][()]
#Do whatever you want with data
#After you are done
f.close()
Upvotes: 46
Reputation: 1
use this it works fine for me
weights = {}
keys = []
with h5py.File("path.h5", 'r') as f:
f.visit(keys.append)
for key in keys:
if ':' in key:
print(f[key].name)
weights[f[key].name] = f[key][()]
return weights
print(read_hdf5())
if you are using the h5py<='2.9.0' then you can use
weights = {}
keys = []
with h5py.File("path.h5", 'r') as f:
f.visit(keys.append)
for key in keys:
if ':' in key:
print(f[key].name)
weights[f[key].name] = f[key].value
return weights
print(read_hdf5())
Upvotes: 0
Reputation: 49
If you have named datasets in the hdf file then you can use the following code to read and convert these datasets in numpy arrays:
import h5py
file = h5py.File('filename.h5', 'r')
xdata = file.get('xdata')
xdata= np.array(xdata)
If your file is in a different directory you can add the path in front of'filename.h5'
.
Upvotes: 2
Reputation: 4547
Using bits of answers from this question and the latest doc, I was able to extract my numerical arrays using
import h5py
with h5py.File(filename, 'r') as h5f:
h5x = h5f[list(h5f.keys())[0]]['x'][()]
Where 'x'
is simply the X coordinate in my case.
Upvotes: 0
Reputation: 321
Here's a simple function I just wrote which reads a .hdf5 file generated by the save_weights function in keras and returns a dict with layer names and weights:
def read_hdf5(path):
weights = {}
keys = []
with h5py.File(path, 'r') as f: # open file
f.visit(keys.append) # append all keys to list
for key in keys:
if ':' in key: # contains data if ':' in key
print(f[key].name)
weights[f[key].name] = f[key].value
return weights
https://gist.github.com/Attila94/fb917e03b04035f3737cc8860d9e9f9b.
Haven't tested it thoroughly but does the job for me.
Upvotes: 10
Reputation: 189
To read the content of .hdf5 file as an array, you can do something as follow
> import numpy as np
> myarray = np.fromfile('file.hdf5', dtype=float)
> print(myarray)
Upvotes: 7
Reputation: 82560
What you need to do is create a dataset. If you take a look at the quickstart guide, it shows you that you need to use the file object in order to create a dataset. So, f.create_dataset
and then you can read the data. This is explained in the docs.
Upvotes: 0