flowoo
flowoo

Reputation: 367

pandas join not working in for loop after first iteration

I am trying to explode a nested list inside a dict by the help of pandas. In a loop I join the list with every row. Strangely in the second iteration the join seems not to work properly. Perhaps there is something fundamental I do not understand about pandas, but I can't figure out why the join in the iteration only works on the first iteration and in the following ones the join does not work, the end the result looks like this:

     key  amount   id  key_r   code   name  key_l
0.0    0    12.0  1.0    0.0    NaN    NaN    NaN
1.0    0    23.0  NaN    0.0    NaN    NaN    NaN
NaN    1     NaN  NaN    NaN  test2  test2    0.0

instead of from the first iteration (strangely the key becomes 1 in the first operation):

   key  amount   id  key_r   code   name  key_l
0    0      12  1.0      0  test1  test1      0
1    0      23  NaN      0  test1  test1      0

Code:

data = [
{
    "code": "test1",
    "name": "test1",
    "sub_list": [
        {"amount": 10, "id": 2},
        {"amount": 20},
    ],
},
{
    "code": "test2",
    "name": "test2",
    "sub_list": [
        {"amount": 12, "id": 1},
        {"amount": 23},
    ],
}

]
data_df = pd.DataFrame(data)
for ix, row in data_df.iterrows():
    sub_list_df = pd.DataFrame(row['sub_list'])
    row_df = row.to_frame().transpose()
    main_df = row_df.loc[:, row_df.columns != 'sub_list']
    main_df['key'] = 0
    sub_list_df['key'] = 0
    print(main_df)
    print(sub_list_df)
    tmp_df = sub_list_df.join(main_df, on=['key'], how="outer", lsuffix="_r", rsuffix="_l")
    print(tmp_df)

Any advice?

Upvotes: 0

Views: 212

Answers (2)

Yasi Klingler
Yasi Klingler

Reputation: 636

The problem is that value 1 is not coming from your dataframe column "key". Instead it takes the index values which are 0 and 1 since you have two rows. one of the solutions to solve this is to set the key column as index in both dataframes. for example:

main_df['_key'] = 0
sub_list_df['_key'] = 0
tmp_df = sub_list_df.set_index('_key').join(main_df.set_index('_key'), on='_key', how="outer")

Upvotes: 1

YOLO
YOLO

Reputation: 21739

Here's a simpler way to do without using explicit for loop:

# explode the dict
f = data_df.explode('sub_list')

# convert exploded dict into separate columns
f = pd.concat([f, f['sub_list'].apply(pd.Series)], axis=1).drop('sub_list', axis=1)

print(f)

    code   name  amount   id
0  test1  test1    10.0  2.0
0  test1  test1    20.0  NaN
1  test2  test2    12.0  1.0
1  test2  test2    23.0  NaN

Upvotes: 1

Related Questions