Reputation: 656
Given the following dictionary created from df['statistics'].head().to_dict()
{0: {'executions': {'total': '1',
'passed': '1',
'failed': '0',
'skipped': '0'},
'defects': {'product_bug': {'total': 0, 'PB001': 0},
'automation_bug': {'AB001': 0, 'total': 0},
'system_issue': {'total': 0, 'SI001': 0},
'to_investigate': {'total': 0, 'TI001': 0},
'no_defect': {'ND001': 0, 'total': 0}}},
1: {'executions': {'total': '1',
'passed': '1',
'failed': '0',
'skipped': '0'},
'defects': {'product_bug': {'total': 0, 'PB001': 0},
'automation_bug': {'AB001': 0, 'total': 0},
'system_issue': {'total': 0, 'SI001': 0},
'to_investigate': {'total': 0, 'TI001': 0},
'no_defect': {'ND001': 0, 'total': 0}}},
2: {'executions': {'total': '1',
'passed': '1',
'failed': '0',
'skipped': '0'},
'defects': {'product_bug': {'total': 0, 'PB001': 0},
'automation_bug': {'AB001': 0, 'total': 0},
'system_issue': {'total': 0, 'SI001': 0},
'to_investigate': {'total': 0, 'TI001': 0},
'no_defect': {'ND001': 0, 'total': 0}}},
3: {'executions': {'total': '1',
'passed': '1',
'failed': '0',
'skipped': '0'},
'defects': {'product_bug': {'total': 0, 'PB001': 0},
'automation_bug': {'AB001': 0, 'total': 0},
'system_issue': {'total': 0, 'SI001': 0},
'to_investigate': {'total': 0, 'TI001': 0},
'no_defect': {'ND001': 0, 'total': 0}}},
4: {'executions': {'total': '1',
'passed': '1',
'failed': '0',
'skipped': '0'},
'defects': {'product_bug': {'total': 0, 'PB001': 0},
'automation_bug': {'AB001': 0, 'total': 0},
'system_issue': {'total': 0, 'SI001': 0},
'to_investigate': {'total': 0, 'TI001': 0},
'no_defect': {'ND001': 0, 'total': 0}}}}
Is there a way to expand the dictionary key/value pairs into their own columns and prefix these columns with the name of the original column, i.e. statisistics.executions.total would become statistics_executions_total or even executions_total?
I have demonstrated that I can create the columns using the following:
pd.concat([df.drop(['statistics'], axis=1), df['statistics'].apply(pd.Series)], axis=1)
However, you will notice that each of these newly created columns have a duplicate name "total".
I; however, have not been able to find a way to prefix the newly created columns with the original column name, i.e. executions_total.
For additional insight, statistics will expand into executions and defects and executions will expand into pass | fail | skipped | total and defects will expand into automation_bug | system_issue | to_investigate | product_bug | no_defect. The later will then expand into total | **001 columns where total is duplicated several times.
Any ideas are greatly appreciated. -Thanks!
Upvotes: 2
Views: 3616
Reputation: 62453
.apply(pd.Series)
is slow, don't use it.
'statistics'
column from the dict
in the OP.
pandas.json_normalize
on the 'statistics'
column.
sep
is .
.
sep
.import pandas as pd
# this is for setting up the test dataframe from the data in the question, where data is the name of the dict
df = pd.DataFrame({'statistics': [v for v in data.values()]})
# display(df)
statistics
0 {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
1 {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
2 {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
3 {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
4 {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
# normalize the statistics column
dfs = pd.json_normalize(df.statistics)
# display(dfs)
total passed failed skipped product_bug.total product_bug.PB001 automation_bug.AB001 automation_bug.total system_issue.total system_issue.SI001 to_investigate.total to_investigate.TI001 no_defect.ND001 no_defect.total
0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 0 0 0 0 0 0 0 0 0 0 0 0
3 1 1 0 0 0 0 0 0 0 0 0 0 0 0
4 1 1 0 0 0 0 0 0 0 0 0 0 0 0
Upvotes: 4