WolfiG
WolfiG

Reputation: 1163

Method converting a dictionary stored in a Pandas DataFrame column into individual columns

I have time-series data in a HDF file. When I load this file I get a Pandas DataFrame like this:

DateTimeIndex | col 1 (float) | ... | col N (float) | dict_col (dict)

The dictionary in dict_col contains data structured like a normal dictionary:

{'field1': value1 (float), ..., 'fieldN': valueN (float)}

How can I convert the initial DataFrame in a DataFrame structured like this?:

DateTimeIndex | col 1 (float) | ... | col N (float) | field1 (float) | ... | floatN (float)

Currently, my code looks lie this:

data = DataFrame(pd.read_hdf('the_hdf_file.h5'))

out_frame = DataFrame() # output DataFrame

for column in data.columns:
    if column != 'dict_col':
        out_frame = pd.concat([out_frame, data[column]], axis=1)
    else:
        sub_set = data[column]
        for value in sub_set.items():
            item_series = Series(value[0])
            item_series = item_series.rename({0: 'time'})
            item_series = pd.concat([item_series, Series(value[1])])
            out_frame = pd.concat([out_frame, item_series], axis=1)

... manipulations with out_frame ...

But this code is utterly slow. How can I do this in a more efficient way?

EDIT: provided Test Data. Note that 'dict_col' is called 'rawdata_boxfit' in the test data.

Upvotes: 0

Views: 53

Answers (1)

René
René

Reputation: 4827

IIUC, you can use:

data = pd.DataFrame(pd.read_hdf('test_data_20220720.h5'))
pd.DataFrame.from_records(data.rawdata_boxfit)

enter image description here

To merge new columns use:

data = pd.DataFrame(pd.read_hdf('test_data_20220720.h5'))
new = pd.DataFrame.from_records(data.rawdata_boxfit)
new.index = data.index
pd.concat([data, new], axis=1).drop('rawdata_boxfit', axis=1)

enter image description here

Upvotes: 2

Related Questions