Reputation: 641
I have an hdf5 file which has 28 datasets inside. Each dataset is of different dimensions. for example the first dataset is [60,8] and the last one is [60,1].
I want to loop through the HDF5 file, read all the data in each of the dataset and write it to a pandas dataframe. In the end I should have a dataframe of size [60, 218]. So far, i've tried the following code. But my code hangs.
Could someone spot the error in my code and tell me a better way to do this?
q=h5py.File('AM_B0_D3.7_2016-04-13T215000.flac.h5', 'r') #reading the hdf5 file
dataset_names_list=[]
q.visit(dataset_names_list.append)#creating a list of datasets in the hdf5 file
ten_min_df= pd.DataFrame()
for i in dataset_names_list:
x=q[i][:]
if x.shape[1]>1:
col1=[i + str(num) for num in range(0, x.shape[1])]
temp=pd.DataFrame(data=x, columns=col1)
ten_min_df=ten_min_df.append(temp)
else:
col2=[i]
temp=pd.DataFrame(data=x, columns=col2)
ten_min_df=ten_min_df.append(temp)
Upvotes: 0
Views: 505
Reputation: 862481
I think you need list of array
s and then use numpy.concatenate
with DataFrame
constructor:
np.random.seed(452)
first=np.random.rand(3,5)
print (first)
[[ 0.88642869 0.42677701 0.89968857 0.87976326 0.07758206]
[ 0.43617027 0.03221375 0.46398119 0.14226246 0.14237448]
[ 0.22679517 0.60271752 0.85003435 0.5676184 0.87565266]]
second=np.random.rand(3,2)
print (second)
[[ 0.89830548 0.27066452]
[ 0.23907483 0.73784657]
[ 0.09083235 0.98984701]]
third=np.random.rand(3,3)
L = [first, second, third]
df = pd.DataFrame(np.concatenate(L, axis=1))
print (df)
0 1 2 3 4 5 6 \
0 0.886429 0.426777 0.899689 0.879763 0.077582 0.898305 0.270665
1 0.436170 0.032214 0.463981 0.142262 0.142374 0.239075 0.737847
2 0.226795 0.602718 0.850034 0.567618 0.875653 0.090832 0.989847
7 8 9
0 0.837404 0.090284 0.764517
1 0.564904 0.489809 0.254518
2 0.426737 0.364310 0.328396
Upvotes: 1