How to convert a column of dictionaries to separate columns in pandas?

Question

Given the following dictionary created from df['statistics'].head().to_dict()

{0: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}},
 1: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}},
 2: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}},
 3: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}},
 4: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}}}

Is there a way to expand the dictionary key/value pairs into their own columns and prefix these columns with the name of the original column, i.e. statisistics.executions.total would become statistics_executions_total or even executions_total?

I have demonstrated that I can create the columns using the following:

pd.concat([df.drop(['statistics'], axis=1), df['statistics'].apply(pd.Series)], axis=1) However, you will notice that each of these newly created columns have a duplicate name "total".

I; however, have not been able to find a way to prefix the newly created columns with the original column name, i.e. executions_total.

Any ideas are greatly appreciated. -Thanks!

Trenton McKinney · Accepted Answer

.apply(pd.Series) is slow, don't use it.
- See timing in Splitting dictionary/list inside a Pandas Column into Separate Columns
Create a DataFrame with a 'statistics' column from the dict in the OP.
- This will create a DataFrame with a column of dictionaries.
Use pandas.json_normalize on the 'statistics' column.
- The default sep is ..
  - Nested records will generate names separated by sep.

import pandas as pd

# this is for setting up the test dataframe from the data in the question, where data is the name of the dict
df = pd.DataFrame({'statistics': [v for v in data.values()]})

# display(df)
                                                                                                                                                                                                                                                                                                    statistics
0  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
1  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
2  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
3  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
4  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}

# normalize the statistics column
dfs = pd.json_normalize(df.statistics)

# display(dfs)
  total passed failed skipped  product_bug.total  product_bug.PB001  automation_bug.AB001  automation_bug.total  system_issue.total  system_issue.SI001  to_investigate.total  to_investigate.TI001  no_defect.ND001  no_defect.total
0     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0
1     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0
2     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0
3     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0
4     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0

How to convert a column of dictionaries to separate columns in pandas?

Answers (1)

Related Questions