How to create datasets within a group in hdf5 file?

Question

I want to create a group with the path "particles/lipids/positions" that contain two datasets e.g. particle is the main group and lipids is datasets contain lipid names and positions will contain the positions of the lipid in each frame. I tried this but getting the following error in the line 40 of the code from previous answers code source:

 ValueError: Unable to create group (name already exists)


import struct
import numpy as np
import h5py

csv_file = 'com'

fmtstring = '7s 8s 5s 7s 7s 7s'
fieldstruct = struct.Struct(fmtstring)
parse = fieldstruct.unpack_from

#define a np.dtype for gro array/dataset (hard-coded for now)
gro_dt = np.dtype([('col1', 'S7'), ('col2', 'S8'), ('col3', int), 
                   ('col4', float), ('col5', float), ('col6', float)])

with open(csv_file, 'r') as f, \
     h5py.File('xaa.h5', 'w') as hdf:
         
    step = 0
    while True:         
        header = f.readline()
        if not header:
            print("End Of File")
            break
        else:
            print(header)

        # get number of data rows
        no_rows = int(f.readline())
        arr = np.empty(shape=(no_rows,), dtype=gro_dt)
        for row in range(no_rows):
            fields = parse( f.readline().encode('utf-8') )
            arr[row]['col1'] = fields[0].strip()            
            arr[row]['col2'] = fields[1].strip()            
            arr[row]['col3'] = int(fields[2])
            arr[row]['col4'] = float(fields[3])
            arr[row]['col5'] = float(fields[4])
            arr[row]['col6'] = float(fields[5])
        if arr.shape[0] > 0:
            # Create a froup to store positions
            particles_grp = hdf.create_group('particles/lipids/positions')
            # create a dataset for THIS time step
            ds= particles_grp.create_dataset(f'dataset_{step:04}', data=arr,compression='gzip') 
            #ds= hdf.create_dataset(f'dataset_{step:04}', data=arr,compression='gzip') 
            #create attributes for this dataset / time step
            hdr_tokens = header.split()
            particles_grp['ds'] = ds
            ds.attrs['raw_header'] = header
            #ds.attrs['Generated by'] = hdr_tokens[2]
            #ds.attrs['P/L'] = hdr_tokens[4].split('=')[1]
            ds.attrs['Time'] = hdr_tokens[6]
            
        footer = f.readline()
        step += 1

The small data file is linked here data file. In the present code each frame is stored in dataset1, 2...so on. I want these datasets to be stored in the particles group. I'm not quite sure if this is the best method for later, because I want to use these frames for further calculations!! Thanks!!

kcw78 · Accepted Answer

As noted in the previous answer, you try to create the same group inside the while loop with this function:
particles_grp = hdf.create_group('particles/lipids/positions')
You get an error the second time you call it (b/c the group already exits).

Instead, use this function to create the group object:
particles_grp = hdf.require_group('particles/lipids/positions')

require_group() is smart (and useful). If the group doesn't exist, it will create it. And, when the group already exists, it will simply return the group object.

Make that change to your code and it will work with no other changes.

Alternately, you can move the create_group() call ABOVE the while True: loop (so it is only called once).

How to create datasets within a group in hdf5 file?

Answers (2)

Related Questions