Reputation: 15
I am running a script that iterates through a folder and creates a set of groups to replicate the directory in a HDF5 file. I am then going through the files and adding the data from the files to HDF5 datasets. However, I am looking to save the datasets into the correct groups but unsure how?
#Development Script Save files in groups
#Create HDF5 File name
TestFilename = 'N:/TestingPyhonHDF5/progress/automation/GroupTesting.h5'
#Set target directory to extract data
TargetFolder = 'N:\Measurements\T2+\Rx-BB001'
# giving file extensions
bin = ('.bin')
csv = ('.CSV')
tmp = ('.tmp')
head = ('.head')
ext = ('.bin', '.head', '.tmp', '.CSV')
# Create HDF5 Strucutre
with h5py.File(TestFilename,'w') as tf:
for root, dirs, _ in os.walk(TargetFolder, topdown=True):
#print(f'ROOT: {root}')
# for Windows, modify root: remove drive letter and replace backslashes:
grp_name = root[2:].replace( '\\', '/')
#print(f'grp_name: {grp_name}\n')
tf.create_group(grp_name)
#Open HDF5 file
with h5py.File(TestFilename,'a') as tfile:
#Iterate files to send to HDF5 file
for path, dirc, files in os.walk(TargetFolder):
for file in files:
if file.endswith(bin):
# Create a dtype with the binary data format and the desired column names
filePath = os.path.join(path, file)
dt = np.dtype('B')
data = np.fromfile(filePath, dtype=dt)
df = pd.DataFrame(data)
#Save as csv
savetxt('TempData.csv', df, delimiter=',')
#Read bin to HDF5
dfBIN = pd.read_csv('TempData.csv')
tfile.create_dataset(grp_name/file, data=dfBIN) #put data in hdf file
#add attrs
os.remove("TempData.csv")
else:
continue
Currently the code shows error
TypeError Traceback (most recent call last)
Cell In [51], line 39
37 #Read bin to HDF5
38 dfBIN = pd.read_csv('TempData.csv')
---> 39 tfile.create_dataset(grp_name/file, data=dfBIN) #put data in hdf file
40 #add attrs
41 os.remove("TempData.csv")
TypeError: unsupported operand type(s) for /: 'str' and 'str'
Upvotes: 0
Views: 321
Reputation: 168986
/
only works to join pathlib.Path
objects and you have strings.
Just do
f"{grp_name}/{file}"
or
posixpath.join(grp_name, file)
(posixpath
to ensure forward slashes no matter which platform you're on)
You will also naturally need to do the same grp_name
determination in the second loop:
grp_name = path[2:].replace('\\', '/')
Otherwise you're using the last value of grp_name
from the earlier loop.
All in all, you may just want to go for a single loop. Since I don't know if h5py ignores attempting to (re)create a group that already exists, I added a set that keeps track of paths already created. Also, the intermediate CSV file seemed quite extraneous.
import os
import posixpath
import pandas as pd
import numpy as np
import h5py
TestFilename = "N:/TestingPyhonHDF5/progress/automation/GroupTesting.h5"
TargetFolder = "N:\Measurements\T2+\Rx-BB001"
groups_created = set()
with h5py.File(TestFilename, "a") as tfile:
for path, dirc, files in os.walk(TargetFolder):
for file in files:
if file.endswith(".bin"):
grp_name = path[2:].replace("\\", "/")
if grp_name not in groups_created:
tfile.create_group(grp_name)
groups_created.add(grp_name)
# Create a dtype with the binary data format and the desired column names
filePath = os.path.join(path, file)
dt = np.dtype("B")
data = np.fromfile(filePath, dtype=dt)
df = pd.DataFrame(data)
tfile.create_dataset(posixpath.join(grp_name, file), data=df)
Upvotes: 2