Reputation: 79
What is the best way to split a column in a df with values as dictionary (some lines does not have all values) to new columns?
From:
df = pd.DataFrame({'index': [0, 1, 2],
'a': [{'name': 'peter', 'country': 'us', 'document': 'singular', 'date': '2023-06-27'},
{'country': 'italy', 'document': 'earth', 'date': '2023-06-25'},
{'name': 'maria', 'country': 'spain', 'date': '2023-06-23'}]})
| index | a |
|:-----:|:-------------------------------------------------------------------------:|
| 0 | {'name':'peter','country':'us','document':'singular','date':'2023-06-27'} |
| 1 | {'country':'italy','document':'earth','date':'2023-06-25'} |
| 2 | {'name':'maria','country':'spain','date':'2023-06-23'} |
To
| index | name | country | document | date |
|:-----:|:------:|:-------:|:--------:|:----------:|
| 0 | peter | us | singular | 2023-06-27 |
| 1 | nan | italy | earth | 2023-06-25 |
| 2 | maria | spain | NaN | 2023-06-23 |
Upvotes: 2
Views: 72
Reputation: 1102
you have 'a' and 'index' keys in the dictionary u can accomplish this using data in 'a' key
a=[{'name': 'peter', 'country': 'us', 'document': 'singular', 'date': '2023-06-27'},{'country': 'italy', 'document': 'earth', 'date': '2023-06-25'},{'name': 'maria', 'country': 'spain', 'date': '2023-06-23'}]
then simply do this
pd.DataFrame(a).reset_index()
output:
index name country document date
0 0 peter us singular 2023-06-27
1 1 NaN italy earth 2023-06-25
2 2 maria spain NaN 2023-06-23
Upvotes: 3
Reputation:
You can use json_normalize
column "a" and join
it back to the DataFrame:
out = df.join(pd.json_normalize(df['a'])).drop(columns=['a'])
Output:
index name country document date
0 0 peter us singular 2023-06-27
1 1 NaN italy earth 2023-06-25
2 2 maria spain NaN 2023-06-23
Upvotes: 2