Reputation: 367
I am trying to explode a nested list inside a dict by the help of pandas. In a loop I join the list with every row. Strangely in the second iteration the join seems not to work properly. Perhaps there is something fundamental I do not understand about pandas, but I can't figure out why the join in the iteration only works on the first iteration and in the following ones the join does not work, the end the result looks like this:
key amount id key_r code name key_l
0.0 0 12.0 1.0 0.0 NaN NaN NaN
1.0 0 23.0 NaN 0.0 NaN NaN NaN
NaN 1 NaN NaN NaN test2 test2 0.0
instead of from the first iteration (strangely the key becomes 1 in the first operation):
key amount id key_r code name key_l
0 0 12 1.0 0 test1 test1 0
1 0 23 NaN 0 test1 test1 0
Code:
data = [
{
"code": "test1",
"name": "test1",
"sub_list": [
{"amount": 10, "id": 2},
{"amount": 20},
],
},
{
"code": "test2",
"name": "test2",
"sub_list": [
{"amount": 12, "id": 1},
{"amount": 23},
],
}
]
data_df = pd.DataFrame(data)
for ix, row in data_df.iterrows():
sub_list_df = pd.DataFrame(row['sub_list'])
row_df = row.to_frame().transpose()
main_df = row_df.loc[:, row_df.columns != 'sub_list']
main_df['key'] = 0
sub_list_df['key'] = 0
print(main_df)
print(sub_list_df)
tmp_df = sub_list_df.join(main_df, on=['key'], how="outer", lsuffix="_r", rsuffix="_l")
print(tmp_df)
Any advice?
Upvotes: 0
Views: 212
Reputation: 636
The problem is that value 1 is not coming from your dataframe column "key". Instead it takes the index values which are 0 and 1 since you have two rows. one of the solutions to solve this is to set the key column as index in both dataframes. for example:
main_df['_key'] = 0
sub_list_df['_key'] = 0
tmp_df = sub_list_df.set_index('_key').join(main_df.set_index('_key'), on='_key', how="outer")
Upvotes: 1
Reputation: 21739
Here's a simpler way to do without using explicit for loop:
# explode the dict
f = data_df.explode('sub_list')
# convert exploded dict into separate columns
f = pd.concat([f, f['sub_list'].apply(pd.Series)], axis=1).drop('sub_list', axis=1)
print(f)
code name amount id
0 test1 test1 10.0 2.0
0 test1 test1 20.0 NaN
1 test2 test2 12.0 1.0
1 test2 test2 23.0 NaN
Upvotes: 1