Reputation: 8644
I have an existing HDF5 file with multiple tables. I want to modify this HDF5 file: in one of the tables I want to drop some rows entirely, and modify values in the remaining rows.
I tried the following code:
import h5py
import numpy as np
with h5py.File("my_file.h5", "r+") as f:
# Get array
table = f["/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX"]
arr = np.array(table)
# Modify array
arr = arr[arr[:, 1] == 2]
arr[:, 1] = 1
# Write array back
table[...] = arr
This code however results in the following error when run:
Traceback (most recent call last):
File "C:\_Work\test.py", line 10, in <module>
arr[arr[:, 1] == 2]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
So one of the problems seems to be that the numpy array arr
that I've created is not a two-dimensional array. However I'm not sure exactly how to create a two-dimensional array out of the HDF5 table (or whether that is even the best approach here).
Would anyone here be able to help put me on the right path?
Output from h5dump
on my dataset is as follows
HDF5 "C:\_Work\my_file.h5" {
DATASET "/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX" {
DATATYPE H5T_COMPOUND {
H5T_STD_I64LE "EID";
H5T_STD_I64LE "PLY";
H5T_IEEE_F64LE "X1R";
H5T_IEEE_F64LE "Y1R";
H5T_IEEE_F64LE "T1R";
H5T_IEEE_F64LE "L1R";
H5T_IEEE_F64LE "L2R";
H5T_IEEE_F64LE "X1I";
H5T_IEEE_F64LE "Y1I";
H5T_IEEE_F64LE "T1I";
H5T_IEEE_F64LE "L1I";
H5T_IEEE_F64LE "L2I";
H5T_STD_I64LE "DOMAIN_ID";
}
DATASPACE SIMPLE { ( 990 ) / ( H5S_UNLIMITED ) }
ATTRIBUTE "version" {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
}
}
}
Upvotes: 0
Views: 93
Reputation: 8091
This answer is specifically focused on OP's request in comments to "throw away all rows where the value for PLY is not 2. Then in the remaining rows change the value for PLY from 2 to 1".
The procedure is relatively straight-forward...if you know the tricks. Steps are summarized here, with matching comments in the code:
np.nonzero()
returns row indices that match the condition stress_arr['PLY']==2
, then uses them as indices to slice values from the array.Code below:
with h5py.File('quad4_comp_cplx_test.h5', 'r+') as h5f:
# Create stress dataset object
stress_ds = h5f['/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX']
## stress array below not reqd
## stress_arr = stress_ds[()]
print(stress_ds.shape)
# Rename/move original output dataset to saved name
h5f.move('/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX',\
'/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX_save')
# Slice a stress array from dataset using indices where PLY==2
# modified reference from stress_arr to stress_ds
## mod_stress_arr = stress_arr[np.nonzero(stress_arr['PLY']==2)]
mod_stress_arr = stress_ds[np.nonzero(stress_ds['PLY']==2)]
print(mod_stress_arr.shape)
# Modify PLY ID from 2 to 1 for all rows
mod_stress_arr[:]['PLY'] = 1
# Finally, save the ply stress array to a dataset with the original name
h5f.create_dataset('/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX',
data=mod_stress_arr)
Upvotes: 1
Reputation: 8091
I am familiar with MSC.Nastran's HDF5 output. First some background. HDF5 datasets can store homogeneous data (for example, all floats or ints) or heterogeneous data (required when you have mixed data types). Nastran creates heterogeneous datasets. Heterogeneous data is stored in rows with field names
that define the "columns" (kind of like a spreadsheet).
Some of the confusion of dataset
vs table
comes from PyTables' nomenclature. It has different objects for homogeneous and heterogeneous data. The table
object is used for heterogeneous data.
h5py
behaves very much like numpy
. It uses a dataset
object for both dataset types. Dataset behavior is similar to numpy arrays. (For example, when you are reading data, you don't need to create an array -- you can simply reference the dataset object.) You determine the datatype by inspecting the dataset's dtype
attribute. Output of this attribute is a list of tuples
with the field name and datatype (and an array dimension for vector/tensor data).
Code below shows how to get the datatype for your data:
with h5py.File("my_file.h5", "r+") as h5f:
# create a dataset object
stress_ds = h5f["/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX"]
print(stress_ds.dtype)
Based on your h5dump
output, I expect you will get something like this (I'm 99% sure this dataset doesn't have any array data):
[('EID', 'i8'), ('PLY', 'i8'),
('X1R', 'i8'), ('Y1R', 'f8'), ('T1R', 'f8'), ('L1R', 'f8'), ('L2R', 'f8'),
('X1I', 'f8'), ('Y1I', 'f8'), ('T1I', 'f8'), ('L1I', 'f8'), ('L2I', 'f8'),
('DOMAIN_ID', 'i8')]
Based on the output of arr.shape
, you have 6408 rows of data (total of elements and plies). It's like a 2-d array, but you reference rows with integer indices and columns with field names.
That covers the basics. Now on to extract and manipulate the data. First, use this line to extract the entire dataset to an array. Notice the [()]
at the end. It tells h5py
to extract the entire array. You can also use numpy slice notation to extract subsets of the data. More on that later.
stress_arr = h5f["/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX"][()]
# or if you created the stress_ds object, use
stress_arr = stress_ds[()]
# checking the array dtype should give the same result as the dataset:
print(stress_arr.dtype)
So, that gives you the array of data to modify. Unfortunately, based on your code, I don't understand how you want to modify it.
Expanding on the slice nomenclature, you can use this notation to access any row or column (or combination). Several examples shown below:
stress_arr_0 = stress_ds[0] # gets all data for the 1st row
stress_arr_eids = stress_ds['EID'] # gets all element ids (only)
stress_arr_0_eid = stress_ds[0]['EID'] # gets eid for 1st row (only)
Writing data back to the array is done the same way. This would set the element id in the first row to 1000.
stress_arr[0]['EID'] = 1000
Writing data back to the dataset is done in a similar way (using integer row indices and field names). However, be careful here -- do you really want to modify your Nastran output? Seems dangerous to me.
Upvotes: 1