How to split a column with values as dictionary to new columns (with NaN values)

Question

What is the best way to split a column in a df with values as dictionary (some lines does not have all values) to new columns?

From:

df = pd.DataFrame({'index': [0, 1, 2], 
                   'a': [{'name': 'peter', 'country': 'us', 'document': 'singular', 'date': '2023-06-27'}, 
                         {'country': 'italy', 'document': 'earth', 'date': '2023-06-25'},
                         {'name': 'maria', 'country': 'spain', 'date': '2023-06-23'}]})

| index |                                  a                                        |
|:-----:|:-------------------------------------------------------------------------:|
|   0   | {'name':'peter','country':'us','document':'singular','date':'2023-06-27'} |
|   1   | {'country':'italy','document':'earth','date':'2023-06-25'}                |
|   2   | {'name':'maria','country':'spain','date':'2023-06-23'}                    |

To

| index |  name  | country | document |    date    |
|:-----:|:------:|:-------:|:--------:|:----------:|
|   0   | peter  | us      | singular | 2023-06-27 |
|   1   | nan    | italy   | earth    | 2023-06-25 |
|   2   | maria  | spain   | NaN      | 2023-06-23 |

user7864386 · Accepted Answer

You can use json_normalize column "a" and join it back to the DataFrame:

out = df.join(pd.json_normalize(df['a'])).drop(columns=['a'])

Output:

   index   name country  document        date
0      0  peter      us  singular  2023-06-27
1      1    NaN   italy     earth  2023-06-25
2      2  maria   spain       NaN  2023-06-23

How to split a column with values as dictionary to new columns (with NaN values)

Answers (2)

Related Questions