Reputation: 16987
I know this has been asked before but in my opinion there are still no answers that explain what is going on and don't happen to work for my case. I have a matlab v7.3 file that is structured like so,
---> rank <1x454 cell> ---> each element is <53x50 double>
f.mat
---> compare <1x454 cell> ---> each element is <53x50 double>
I hope this is straight forward enough. So what I am trying to do is read all 454 arrays with dimensions 53x54 from the cell array named 'rank', into a list of numpy arrays in python using the h5py library like so:
import h5py
with h5py.File("f.mat") as f:
data = [np.array(element) for element in f['rank']]
what I end up with is a list of arrays of HDF5 object references:
In [53]: data[0]
Out[53]: array([<HDF5 object reference>], dtype=object)
What do I do with this / how do I get the list of arrays that I need?
Upvotes: 2
Views: 8378
Reputation: 534
Try mat73, works like charm.
pip install mat73
import mat73
data_dict = mat73.loadmat('train/digitStruct.mat')
Upvotes: 2
Reputation: 231385
Just by way of comparison, in Octave I created and wrote:
X = cell(1,10)
for i = 1:10
X{i}=ones(i,i)
end
save Xcell1 -hdf5 X
then in Python:
f=h5py.File('Xcell1','r')
grp=f['X']
grpv=grp['value']
X=list(grpv.items())
[x[1]['value'].value for x in X[:-1]] # list of those 10 arrays
X[-1][1].value # (10,1) the cell array shape
or in one line
X = [f['/X/value/_0{}/value'.format(i)].value for i in range(0,10)]
With a callback function that I wrote for https://stackoverflow.com/a/27699851/901925
The file can be viewed with:
f.visititems(callback)
producing:
name: X
type: b'cell'
name: X/value/_00
type: b'scalar'
1.0
name: X/value/_01
type: b'matrix'
[[ 1. 1.]
[ 1. 1.]]
name: X/value/_02
type: b'matrix'
[[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]]
name: X/value/_03
...
dims: [10 1]
Upvotes: 1
Reputation: 16987
Well I found the solution to my problem. If anyone else has a better solution or can better explain I'd still like to hear it.
Basically, the <HDF5 object reference>
needed to be used to index the h5py file object to get the underlying array that is being referenced. After we are referring to the array that is needed, it has to be loaded to memory by indexing it with [:]
or any subset if only part of the array is required. Here is what I mean:
with h5py.File("f.mat") as f:
data = [f[element[0]][:] for element in f['rank']]
and the result:
In [79]: data[0].shape
Out[79]: (50L, 53L)
In [80]: data[0].dtype
Out[80]: dtype('float64')
Hope this helps anyone in the future. I think this is the most general solution I've seen so far.
Upvotes: 12