Reputation: 2802
I have an HDF5 output file from NASTRAN that contains mode shape data. I am trying to read them into Matlab and Python to check various post-processing techniques. The file in question is in the local directory for both of these tests. The file is semi-large at 1.2 GB but certainly not that large in terms of HDF5 files I have read previously. There are 17567342 rows and 8 columns in the table I want to access. The first and last columns are integers the middle 6 are floating point numbers.
Matlab:
file = 'HDF5.h5';
hinfo = hdf5info(file);
% ... Find the dataset I want to extract
t = hdf5read(file, '/NASTRAN/RESULT/NODAL/EIGENVECTOR');
This last operation is extremely slow (can be measured in hours).
Python:
import tables
hfile = tables.open_file("HDF5.h5")
modetable = hfile.root.NASTRAN.RESULT.NODAL.EIGENVECTOR
data = modetable.read()
This last operation is basically instant. I can then access data
as if it were a numpy array. I am clearly missing something very basic about what these commands are doing. I'm thinking it might have something to do with data conversion but I'm not sure. If I do type(data)
I get back numpy.ndarray
and type(data[0])
returns numpy.void
.
What is the correct (i.e. speedy) way to read the dataset I want into Matlab?
Upvotes: 1
Views: 1553
Reputation: 2802
Matlab provided another HDF5 reader called h5read
. Using the same basic approach the amount of time taken to read the data was drastically reduced. In fact hdf5read
is listed for removal in a future version. Here is same basic code with the perfered functions.
file = 'HDF5.h5';
hinfo = h5info(file);
% ... Find the dataset I want to extract
t = h5read(file, '/NASTRAN/RESULT/NODAL/EIGENVECTOR');
Upvotes: 1
Reputation: 8006
Matt, Are you still working on this problem?
I am not a matlab guy, but I am familiar with Nastran HDF5 file. You are right; 1.2 GB is big, but not that big by today's standards.
You might be able to diagnose the matlab performance bottle neck by running tests with different numbers of rows in your EIGENVECTOR dataset. To do that (without running a lot of Nastran jobs), I created some simple code to create a HDF5 file with a user defined # of rows. It mimics the structure of the Nastran Eigenvector Result dataset. See below:
import tables as tb
import numpy as np
hfile = tb.open_file('SO_54300107.h5','w')
eigen_dtype = np.dtype([('ID',int), ('X',float),('Y',float),('Z',float),
('RX',float),('RY',float),('RZ',float), ('DOMAIN_ID',int)])
fsize = 1000.0
isize = int(fsize)
recarr = np.recarray((isize,),dtype=eigen_dtype)
id_arr = np.arange(1,isize+1)
dom_arr = np.ones((isize,), dtype=int)
arr = np.array(np.arange(fsize))/fsize
recarr['ID'] = id_arr
recarr['X'] = arr
recarr['Y'] = arr
recarr['Z'] = arr
recarr['RX'] = arr
recarr['RY'] = arr
recarr['RZ'] = arr
recarr['DOMAIN_ID'] = dom_arr
modetable = hfile.create_table('/NASTRAN/RESULT/NODAL', 'EIGENVECTOR',
createparents=True, obj=recarr )
hfile.close()
Try running this with different values for fsize (# of rows), then attach the HDF5 file it creates to matlab. Maybe you can find the point where performance noticeably degrades.
Upvotes: 1