ankit agrawal
ankit agrawal

Reputation: 311

How to access cell variables stored in .mat file using h5py module?

I have few MATLAB variables of following datafields which I saved in test.mat file using -v7.3 flag and want to read back using h5py module for other purpose.

load('test.mat'); % give me this 

struct with fields:
       volume: [4240×1 double]
     centroid: [4240×3 double]
        faces: {4240×1 cell}
          nuc: {4240×1 cell}

I can read the double field variable but unable to access the cell fields variable. Is therey any way I can access nuc and faces variable from python?

>>>import h5py
>>>name='test.mat' 
>>>f=h5py.File(name)
>>>f.keys()
<KeysViewHDF5 ['#refs#', 'volume', 'centroid', 'faces', 'nuc']>
>>>o1=f['centroid'] 
<HDF5 dataset "centroid": shape (3, 4240), type "<f8">
>>>o1[:,0]
array([ -387.82973928,   533.54789111, -7359.64917621])
>>>o3=f['nuc']
<HDF5 dataset "nuc": shape (1, 4240), type "|O">
>>>type(o3)
<class 'h5py._hl.dataset.Dataset'>
>>>type(o3[0])
<class 'numpy.ndarray'>
>>>type(o3[0][0])
<class 'h5py.h5r.Reference'>
>>>o3[0][0]
<HDF5 object reference>
>>>o3[0]
array([<HDF5 object reference>, <HDF5 object reference>,
   <HDF5 object reference>, ..., <HDF5 object reference>,
   <HDF5 object reference>, <HDF5 object reference>], dtype=object)

I tried all the option but I cannot see the numerical values of nuc variable. Any suggestion will be appreciated.

Thanks for the comment everyone. Following command is working now.

 >>> f[f['nuc'][0][0]][:]
   array([[ -733.94435313,  -733.66995189,  -734.09632262, ...,
     -733.66832197,  -733.81233202,  -733.54615564],
   [  247.76823184,   247.49908481,   248.17514583, ...,
      240.16088783,   240.56909865,   240.84810507],
   [-7485.86866961, -7485.92114207, -7485.93468626, ...,
    -7508.16909395, -7508.16306386, -7508.20712349]])
 >>> f[f['nuc'][0][0]][:].shape
    (3, 1512)
 >>> f[f['nuc'][0][1]][:].shape
    (3, 1491)
 >>> f[f['nuc'][0][2]][:].shape
    (3, 1556)

Upvotes: 0

Views: 863

Answers (1)

kcw78
kcw78

Reputation: 8006

.mat file saved using -v7.3 flag (HDF5 format) uses a complex data schema that uses "object references". Object references are not the data, but a pointer to the data (in a different location). You use the object reference to get to the data (in your example, the nuc values). You can get data for the first element of nuc like this:
arr = f[ f['nuc'][0][0] ][:], or arr = f[ o3[0][0] ][:]
(You can also use comma delimiters if you prefer: f[ f['nuc'][0,0] ][:] )

Deconstructing the expression above:
f['nuc'] --> is a field (column) of data
f['nuc'][0] --> is the first element in the column (an array of object references)
f['nuc'][0][0] --> is the first object reference in the array
f[ f['nuc'][0][0] ][:] --> dereferences the object reference and reads the data, ie reads the array
Alternately, you can do this (method I prefer for readability):
obj_ref = f['nuc'][0][0] --> returns the first object reference
f[obj_ref][:] --> dereferences the object reference and reads the array data

This SO Q&A gives a basic explanation on reading .mat files:
read-matlab-v7-3-file-into-python-list-of-numpy-arrays-via-h5py

I wrote a more complete explanation (for reading SVHN datasets). You can access it here:
what-is-the-difference-between-the-two-ways-of-accessing-the-hdf5-group-in-svhn

Upvotes: 2

Related Questions