Reputation: 23

PANDAS pd.read_hdf works with some, but not all tables in my HDF5 file

I am using Pandas in Python 3.7 in order to read data from a HDF5 file. The HDF5 file contains tables of results from MSC Nastran.

The HDF5 file is named 'ave_01.h5'

The HDF5 table of displacements looks like this:

Using the following works just fine:

import numpy as np
import pandas as pd
pd.read_hdf('./ave_01.h5', 'NASTRAN/RESULT/NODAL/DISPLACEMENT')

However, I have another table for stress results, which looks like this:

So I would expect the following code to work, but it does not:

pd.read_hdf('./ave_01.h5', '/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD_CN')

I receive the following error:

ValueError: Wrong number of items passed 5, placement implies 1

I have noticed that this second table contains lists in some columns, whereas the first table does not. These lists also contain 5 elements. Perhaps this is causing the error, but I don't know if this is true, nor how to correct for this.

Where am I going wrong?

Thanks.

For reference, these results are of a simple test model, as can be seen below:

Upvotes: 0

Answers (3)

Josiah Lund

Reputation: 55

I can't give a better explanation than has already been given as to why the code you have does not work, but I have developed a workaround for my own uses. From what I understand, the way MSC has chosen to format their "corner" results is what is causing the issue. It would be nice if they either come out with a "how to" for reading corner data to a dataframe, or consider reformatting how their data is organized to play nicer with the standard pandas read_hdf function instead of leaving it to the end user to determine how to organize the data.

Replace "h5_file_path" and "table_path" with the necessary information for your specific file/table.

with h5py.File(h5_file_path, mode='r') as hdf5:
    header = hdf5[table_path][:].dtype.names
    data = [[item[np.where(row['GRID'] == grid)][0] if isinstance(item, np.ndarray) else item for item in row]
        for row in hdf5[table_path][:]
        for grid in row['GRID']]
df = pd.DataFrame(data=data, columns=header)

Upvotes: 0

kcw78

Reputation: 8091

A quick clarification is in order regarding the format of the data in the HDF5 file created by MSC Nastran. The values are not Python Lists, but an NumPy Array. I know, it's deceptive, as both datatypes use [val1, val2, val3], and both uses indexes to access individual elements. However, they are not the same. You can confirm this by checking the datatype for each field using the .dtype attribute as shown below.

Each array has values at multiple element locations. This occurs when your Nastran stress request has (BOTH); you get output at the Centroid and the corners/grids. The locations matches the Grid IDs in the GRID field.

Here is a simple example working with Quad4 element data. The process is similar for other element types:

In [1]: import h5py
In [2]: h5f = h5py.File('tube_a_mesh.h5', 'r')
In [3]: str_ds = h5f['/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD_CN']
In [4]: print (str_ds.dtype)
{'names' ['EID','TERM','GRID','FD1','X1','Y1','TXY1','FD2','X2','Y2','TXY2','DOMAIN_ID'], 
'formats':['<i8','S4',('<i8', (5,)),('<f8', (5,)),('<f8', (5,)),('<f8', (5,)),('<f8', (5,)),('<f8', (5,)),('<f8', (5,)),('<f8', (5,)),('<f8', (5,)),'<i8'], 'offsets':[0,8,16,56,96,136,176,216,256,296,336,376], 
'itemsize':384}

The dytpe shows GRID is ('<i8', (5,)) and X1 is ('<f8', (5,)) (and the same dtype for the other stress values: Y1, TXY1, etc).
Continuing, this is how to extract the Sx stresses at Z1 location as a HDF5 dataset object.

In [5]: quad_sx_arr= str_ds['X1']
In [6]: print (quad_sx_arr.dtype, quad_sx_arr.dtype)
float64  (4428, 5)

Alternately, this is how to extract all of the Sx stresses at Z1 as a NumPy array.

In [7]: quad_sx_arr= str_ds['X1'][:]
In [8]: print (quad_sx_arr.dtype, quad_sx_arr.dtype)
float64  (4428, 5)

Finally, if you only want the centroid values (first element of each X1 array), this is how to extract them as a NumPy array.

In [9]: quad_csx_arr = quad_sx_arr[:,0]
In [10]: print (quad_csx_arr.dtype, quad_csx_arr.shape)
float64 (4428,)

Upvotes: 0

Chris A.

Reputation: 131

You are correct, the issue is associated with the list of 5 elements.

I was able to replicate the issue on my end. In my case, the list has 9 elements, but the read_hdf function expects only one value per table cell.

Below is my Python code with Pandas. Unfortunately, I was not able to work around the issue.

I was able to successfully move forward by using the h5py library instead. Further down is my Python code with the h5py library.

Pandas

Working example

import pandas as pd

test_output = pd.read_hdf('./nug_46.h5', '/NASTRAN/RESULT/NODAL/DISPLACEMENT')
print(test_output)
# returns
#           ID         X         Y         Z   RX   RY   RZ  DOMAIN_ID
# 0          3 -0.000561 -0.001269  0.001303  0.0  0.0  0.0          2
# 1          5 -0.001269 -0.000561  0.001303  0.0  0.0  0.0          2
# 2          6 -0.001342 -0.000668  0.001181  0.0  0.0  0.0          2
# 3          7 -0.001342 -0.000794  0.001162  0.0  0.0  0.0          2
# 4          8 -0.001335 -0.000893  0.001120  0.0  0.0  0.0          2
# ...      ...       ...       ...       ...  ...  ...  ...        ...
# 4878   20475  0.000000  0.000000  0.000000  0.0  0.0  0.0          2
# 4879   20478  0.000000  0.000000  0.000000  0.0  0.0  0.0          2
# 4880  100001  0.000000  0.000000  0.000000  0.0  0.0  0.0          2
# 4881  100002  0.000000  0.000000  0.000000  0.0  0.0  0.0          2
# 4882  100003  0.000000  0.000000  0.000000  0.0  0.0  0.0          2

Non-working example

test_output = pd.read_hdf('./nug_46.h5', 'NASTRAN/RESULT/ELEMENTAL/STRESS/HEXA')
print(test_output)
# returns an error
# Traceback (most recent call last):
#   File "/home/apricot/PycharmProjects/python_hdf5_reader/venv/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1654, in create_block_manager_from_blocks
#     make_block(values=blocks[0], placement=slice(0, len(axes[0])))
#   File "/home/apricot/PycharmProjects/python_hdf5_reader/venv/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 3041, in make_block
#     return klass(values, ndim=ndim, placement=placement)
#   File "/home/apricot/PycharmProjects/python_hdf5_reader/venv/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 125, in __init__
#     f"Wrong number of items passed {len(self.values)}, "
# ValueError: Wrong number of items passed 9, placement implies 1

H5PY

Working example

import h5py

file = h5py.File('./nug_46.h5', 'r')

# Open the dataset of compound type
dataset = file['/NASTRAN/RESULT/ELEMENTAL/STRESS/HEXA']

# Print the column names
column_names = dataset.dtype.names
print(column_names)
# returns
# ('EID', 'CID', 'CTYPE', 'NODEF', 'GRID', 'X', 'Y', 'Z', 'TXY', 'TYZ', 'TZX', 'DOMAIN_ID')

# Print the first ten rows of the dataset
# If you want to print the whole dataset, leave out the brackets and
# colon, e.g. enumerate(dataset)
for i, line in enumerate(dataset[0:10]):
    print(line)
# returns
# (447, 0, b'GRID', 8, [   0,    5,    6,   12,   11, 1716, 1340, 1346, 1345], ..., 2)
# (448, 0, b'GRID', 8, [   0,    6,    7,   13,   12, 1340, 1341, 1347, 1346], ..., 2)
# (449, 0, b'GRID', 8, [   0,    7,    8,   14,   13, 1341, 1342, 1348, 1347], ..., 2)
# (450, 0, b'GRID', 8, [   0,    8,    9,   15,   14, 1342, 1343, 1349, 1348], ..., 2)
# (451, 0, b'GRID', 8, [   0,    9,   10,   16,   15, 1343, 1344, 1350, 1349], ..., 2)
# (452, 0, b'GRID', 8, [   0,   11,   12,   18,   17, 1345, 1346, 1352, 1714], ..., 2)
# (453, 0, b'GRID', 8, [   0,   12,   13,   19,   18, 1346, 1347, 1353, 1352], ..., 2)
# (454, 0, b'GRID', 8, [   0,   13,   14,   20,   19, 1347, 1348, 1354, 1353], ..., 2)
# (455, 0, b'GRID', 8, [   0,   14,   15,   21,   20, 1348, 1349, 1355, 1354], ..., 2)
# (456, 0, b'GRID', 8, [   0,   15,   16,   22,   21, 1349, 1350, 1356, 1355], ..., 2)

# Print the 2nd row, 1st column in the dataset
print(dataset[1][column_names[0]])
# returns
# 448

# Print the 2nd row, 5th column, 3rd element of the list in the dataset
print(dataset[1][column_names[4]][2])
# returns
# 7

# Same as above, but by using the column name
print(dataset[1]['GRID'][2])
# returns
# 7

Upvotes: 0

PANDAS pd.read_hdf works with some, but not all tables in my HDF5 file

Answers (3)

Related Questions