Reputation: 1016
I want to create a group with the path "particles/lipids/positions" that contain two datasets e.g. particle is the main group and lipids is datasets contain lipid names and positions will contain the positions of the lipid in each frame. I tried this but getting the following error in the line 40 of the code from previous answers code source:
ValueError: Unable to create group (name already exists)
import struct
import numpy as np
import h5py
csv_file = 'com'
fmtstring = '7s 8s 5s 7s 7s 7s'
fieldstruct = struct.Struct(fmtstring)
parse = fieldstruct.unpack_from
#define a np.dtype for gro array/dataset (hard-coded for now)
gro_dt = np.dtype([('col1', 'S7'), ('col2', 'S8'), ('col3', int),
('col4', float), ('col5', float), ('col6', float)])
with open(csv_file, 'r') as f, \
h5py.File('xaa.h5', 'w') as hdf:
step = 0
while True:
header = f.readline()
if not header:
print("End Of File")
break
else:
print(header)
# get number of data rows
no_rows = int(f.readline())
arr = np.empty(shape=(no_rows,), dtype=gro_dt)
for row in range(no_rows):
fields = parse( f.readline().encode('utf-8') )
arr[row]['col1'] = fields[0].strip()
arr[row]['col2'] = fields[1].strip()
arr[row]['col3'] = int(fields[2])
arr[row]['col4'] = float(fields[3])
arr[row]['col5'] = float(fields[4])
arr[row]['col6'] = float(fields[5])
if arr.shape[0] > 0:
# Create a froup to store positions
particles_grp = hdf.create_group('particles/lipids/positions')
# create a dataset for THIS time step
ds= particles_grp.create_dataset(f'dataset_{step:04}', data=arr,compression='gzip')
#ds= hdf.create_dataset(f'dataset_{step:04}', data=arr,compression='gzip')
#create attributes for this dataset / time step
hdr_tokens = header.split()
particles_grp['ds'] = ds
ds.attrs['raw_header'] = header
#ds.attrs['Generated by'] = hdr_tokens[2]
#ds.attrs['P/L'] = hdr_tokens[4].split('=')[1]
ds.attrs['Time'] = hdr_tokens[6]
footer = f.readline()
step += 1
The small data file is linked here data file. In the present code each frame is stored in dataset1, 2...so on. I want these datasets to be stored in the particles group. I'm not quite sure if this is the best method for later, because I want to use these frames for further calculations!! Thanks!!
Upvotes: 0
Views: 3530
Reputation: 8081
As noted in the previous answer, you try to create the same group inside the while loop with this function:
particles_grp = hdf.create_group('particles/lipids/positions')
You get an error the second time you call it (b/c the group already exits).
Instead, use this function to create the group object:
particles_grp = hdf.require_group('particles/lipids/positions')
require_group()
is smart (and useful). If the group doesn't exist, it will create it. And, when the group already exists, it will simply return the group object.
Make that change to your code and it will work with no other changes.
Alternately, you can move the create_group()
call ABOVE the while True:
loop (so it is only called once).
Upvotes: 1
Reputation: 894
You are running the line of code:
particles_grp = hdf.create_group('particles/lipids/positions')
Inside your while loop. This means you are trying to create the group inside the hdf5 file more than once which is not possible (as the name is hardcoded). Try something like this.
with open(csv_file, 'r') as f, \
h5py.File('xaa.h5', 'w') as hdf:
# Create a froup to store positions
particles_grp = hdf.create_group('particles/lipids/positions')
step = 0
while True:
header = f.readline()
if not header:
print("End Of File")
break
else:
print(header)
# get number of data rows
no_rows = int(f.readline())
arr = np.empty(shape=(no_rows,), dtype=gro_dt)
for row in range(no_rows):
fields = parse( f.readline().encode('utf-8') )
arr[row]['col1'] = fields[0].strip()
arr[row]['col2'] = fields[1].strip()
arr[row]['col3'] = int(fields[2])
arr[row]['col4'] = float(fields[3])
arr[row]['col5'] = float(fields[4])
arr[row]['col6'] = float(fields[5])
if arr.shape[0] > 0:
# create a dataset for THIS time step
ds= particles_grp.create_dataset(f'dataset_{step:04}', data=arr,compression='gzip')
#ds= hdf.create_dataset(f'dataset_{step:04}', data=arr,compression='gzip')
#create attributes for this dataset / time step
hdr_tokens = header.split()
particles_grp['ds'] = ds
ds.attrs['raw_header'] = header
#ds.attrs['Generated by'] = hdr_tokens[2]
#ds.attrs['P/L'] = hdr_tokens[4].split('=')[1]
ds.attrs['Time'] = hdr_tokens[6]
footer = f.readline()
step += 1
I assume this is the issue from the error message, give this a go and let me know if it works.
HDF5 uses hierarchical file structure similar to your file system. Imagine you are trying to create two directories (folders) of the same name. You can only have one folder of the the same. So create the group or folder first and then put the files (datasets) in the group.
EDIT: it looks like you are going to run into a further issue here
particles_grp['ds'] = ds
You need to create custom names for your datasets in your group as you cannot have 2 of the same name.
Try something like this:
particles_grp[f'dataset_{step:04}'] = ds
Upvotes: 1