Reputation: 47
How do I convert a big table in Pandas/Numpy to h5 format with the same structure? I used the next code, but received .h5 version with messy data
data.to_hdf('data.h5',format = 'table', key='data')
I attached the image with my data
Or which data type can you recommend ?
I received the next structure
Upvotes: 2
Views: 3947
Reputation: 8091
Setting format='table'
writes the data as a PyTables Table. When you do this all the data will be in the 'table' dataset in group defined by key=
. However, data of common data types will be grouped together in 1 'values_block_#' column/field (all ints, all floats, etc). To write them separately, you also need data_columns=True
. That defines the columns to be created as indexed data columns (set to True
to use all columns).
Example below demonstrate the differences from each option. It creates 3 different files using data from your example. If you still don't like the format with data_columns=True
, you can use h5py or tables (PyTables) package to create the HDF5 schema and write the data as you like.
data_columns=True
Code below:
id = [f'subj8_series8_{i}' for i in range(5) ] + \
[f'subj8_series8_12409{i}' for i in range(5) ]
Fp1 = [ 12, 157, 34, -98, 28,
-160, -30, 64, 134, 159 ]
Fp2 = [ 60, 181, 111, 25, 120,
192, 261, 322, 383, 407 ]
df = pd.DataFrame({'id': id, 'Fp1': Fp1, 'Fp2': Fp2})
df.to_hdf('file_1.h5', key='data')
df.to_hdf('file_2.h5', key='data', format='table')
df.to_hdf('file_3.h5', key='data', format='table', data_columns=True)
Upvotes: 3