Reputation: 311
I have few MATLAB variables of following datafields which I saved in test.mat file using -v7.3 flag and want to read back using h5py module for other purpose.
load('test.mat'); % give me this
struct with fields:
volume: [4240×1 double]
centroid: [4240×3 double]
faces: {4240×1 cell}
nuc: {4240×1 cell}
I can read the double field variable but unable to access the cell fields variable. Is therey any way I can access nuc and faces variable from python?
>>>import h5py
>>>name='test.mat'
>>>f=h5py.File(name)
>>>f.keys()
<KeysViewHDF5 ['#refs#', 'volume', 'centroid', 'faces', 'nuc']>
>>>o1=f['centroid']
<HDF5 dataset "centroid": shape (3, 4240), type "<f8">
>>>o1[:,0]
array([ -387.82973928, 533.54789111, -7359.64917621])
>>>o3=f['nuc']
<HDF5 dataset "nuc": shape (1, 4240), type "|O">
>>>type(o3)
<class 'h5py._hl.dataset.Dataset'>
>>>type(o3[0])
<class 'numpy.ndarray'>
>>>type(o3[0][0])
<class 'h5py.h5r.Reference'>
>>>o3[0][0]
<HDF5 object reference>
>>>o3[0]
array([<HDF5 object reference>, <HDF5 object reference>,
<HDF5 object reference>, ..., <HDF5 object reference>,
<HDF5 object reference>, <HDF5 object reference>], dtype=object)
I tried all the option but I cannot see the numerical values of nuc variable. Any suggestion will be appreciated.
Thanks for the comment everyone. Following command is working now.
>>> f[f['nuc'][0][0]][:]
array([[ -733.94435313, -733.66995189, -734.09632262, ...,
-733.66832197, -733.81233202, -733.54615564],
[ 247.76823184, 247.49908481, 248.17514583, ...,
240.16088783, 240.56909865, 240.84810507],
[-7485.86866961, -7485.92114207, -7485.93468626, ...,
-7508.16909395, -7508.16306386, -7508.20712349]])
>>> f[f['nuc'][0][0]][:].shape
(3, 1512)
>>> f[f['nuc'][0][1]][:].shape
(3, 1491)
>>> f[f['nuc'][0][2]][:].shape
(3, 1556)
Upvotes: 0
Views: 863
Reputation: 8006
.mat file saved using -v7.3 flag (HDF5 format) uses a complex data schema that uses "object references". Object references are not the data, but a pointer to the data (in a different location).
You use the object reference to get to the data (in your example, the nuc values).
You can get data for the first element of nuc
like this:
arr = f[ f['nuc'][0][0] ][:]
, or arr = f[ o3[0][0] ][:]
(You can also use comma delimiters if you prefer: f[ f['nuc'][0,0] ][:]
)
Deconstructing the expression above:
f['nuc']
--> is a field (column) of data
f['nuc'][0]
--> is the first element in the column (an array of object references)
f['nuc'][0][0]
--> is the first object reference in the array
f[ f['nuc'][0][0] ][:]
--> dereferences the object reference and reads the data, ie reads the array
Alternately, you can do this (method I prefer for readability):
obj_ref = f['nuc'][0][0]
--> returns the first object reference
f[obj_ref][:]
--> dereferences the object reference and reads the array data
This SO Q&A gives a basic explanation on reading .mat files:
read-matlab-v7-3-file-into-python-list-of-numpy-arrays-via-h5py
I wrote a more complete explanation (for reading SVHN datasets). You can access it here:
what-is-the-difference-between-the-two-ways-of-accessing-the-hdf5-group-in-svhn
Upvotes: 2