Reputation: 487
How to explode pandas data frame?
Input df:
Required output df:
+----------------+------+-----+------+
|level_2 | date | val | num |
+----------------+------+-----+------+
| name_1a | 2020 | 1 | null |
| name_1b | 2019 | 2 | null |
| name_1b | 2020 | 3 | null |
| name_10000_xyz | 2018 | 4 | str |
| name_10000_xyz | 2019 | 5 | null |
| name_10000_xyz | 2020 | 6 | str |
+------------------------------------+
To reproduce input df:
import pandas as pd
pd.set_option('display.max_colwidth', None)
data={'level_2':{1:'name_1a',3:'name_1b',5:'name_10000_xyz'},'value':{1:[{'date':'2020','val':1}],3:[{'date':'2019','val':2},{'date':'2020','val':3}],5:[{'date':'2018','val':4,'num':'str'},{'date':'2019','val':5},{'date':'2020','val':6,'num':'str'}]}}
df = pd.DataFrame(data)
Upvotes: 4
Views: 4916
Reputation: 71689
Explode
the dataframe on value
column, then pop
the value
column and create a new dataframe from it then join
the new frame with the exploded frame.
s = df.explode('value', ignore_index=True)
s.join(pd.DataFrame([*s.pop('value')], index=s.index))
level_2 date val num
0 name_1a 2020 1 NaN
1 name_1b 2019 2 NaN
2 name_1b 2020 3 NaN
3 name_10000_xyz 2018 4 str
4 name_10000_xyz 2019 5 NaN
5 name_10000_xyz 2020 6 str
Upvotes: 7
Reputation: 153460
Try, explode
then use pd.DataFrame constructor build dataframe from dictionaries:
dfe = df.explode('value')
df_out = pd.DataFrame([i for i in dfe['value'].to_numpy()], index=dfe['level_2']).reset_index()
print(df_out)
Output:
level_2 date val num
0 name_1a 2020 1 NaN
1 name_1b 2019 2 NaN
2 name_1b 2020 3 NaN
3 name_10000_xyz 2018 4 str
4 name_10000_xyz 2019 5 NaN
5 name_10000_xyz 2020 6 str
Upvotes: 2