altermann
altermann

Reputation: 79

How to split a column with values as dictionary to new columns (with NaN values)

What is the best way to split a column in a df with values as dictionary (some lines does not have all values) to new columns?

From:

df = pd.DataFrame({'index': [0, 1, 2], 
                   'a': [{'name': 'peter', 'country': 'us', 'document': 'singular', 'date': '2023-06-27'}, 
                         {'country': 'italy', 'document': 'earth', 'date': '2023-06-25'},
                         {'name': 'maria', 'country': 'spain', 'date': '2023-06-23'}]})
| index |                                  a                                        |
|:-----:|:-------------------------------------------------------------------------:|
|   0   | {'name':'peter','country':'us','document':'singular','date':'2023-06-27'} |
|   1   | {'country':'italy','document':'earth','date':'2023-06-25'}                |
|   2   | {'name':'maria','country':'spain','date':'2023-06-23'}                    |

To

| index |  name  | country | document |    date    |
|:-----:|:------:|:-------:|:--------:|:----------:|
|   0   | peter  | us      | singular | 2023-06-27 |
|   1   | nan    | italy   | earth    | 2023-06-25 |
|   2   | maria  | spain   | NaN      | 2023-06-23 |

Upvotes: 2

Views: 72

Answers (2)

gilf0yle
gilf0yle

Reputation: 1102

you have 'a' and 'index' keys in the dictionary u can accomplish this using data in 'a' key

a=[{'name': 'peter', 'country': 'us', 'document': 'singular', 'date': '2023-06-27'},{'country': 'italy', 'document': 'earth', 'date': '2023-06-25'},{'name': 'maria', 'country': 'spain', 'date': '2023-06-23'}]

then simply do this

pd.DataFrame(a).reset_index()

output:

   index   name country  document        date
0      0  peter      us  singular  2023-06-27
1      1    NaN   italy     earth  2023-06-25
2      2  maria   spain       NaN  2023-06-23

Upvotes: 3

user7864386
user7864386

Reputation:

You can use json_normalize column "a" and join it back to the DataFrame:

out = df.join(pd.json_normalize(df['a'])).drop(columns=['a'])

Output:

   index   name country  document        date
0      0  peter      us  singular  2023-06-27
1      1    NaN   italy     earth  2023-06-25
2      2  maria   spain       NaN  2023-06-23

Upvotes: 2

Related Questions