Reputation: 71
I have two dataframes df1
and df2
. df1
as two columns with distinct values across multiple rows. It looks similar to the following:
col_a | col_b |
---|---|
aa | 50 |
bb | 132 |
dd | 543 |
df2
has similar structure to the following:
col_a | col__b | col_x | col_y | col_z |
---|---|---|---|---|
aa | xy | xy | xy | 2 |
aa | xy | xy | xy | 3 |
bb | xy | xy | xy | 14 |
bb | xy | xy | xy | 9 |
bb | xy | xy | xy | 6 |
cc | xy | xy | xy | 0 |
cc | xy | xy | xy | 2 |
dd | xy | xy | xy | 0 |
dd | xy | xy | xy | 2 |
I want to filter df2
through a loop using values from df1
in a way that after first iteration of loop I end up with the follow:
col_a | col_b | col_x | col_y | col_z |
---|---|---|---|---|
aa | xy | xy | xy | 2 |
aa | xy | xy | xy | 3 |
And after second iteration I should have the following:
col_a | col_b | col_x | col_y | col_z |
---|---|---|---|---|
bb | xy | xy | xy | 14 |
bb | xy | xy | xy | 9 |
bb | xy | xy | xy | 6 |
I do not want the filtering to modify the df2
permanently. Subsequent iterations should filter df2
based on values from col_a of df1
.
I have no idea how to achieve this so I would appreciate any help.
Upvotes: 0
Views: 984
Reputation: 1179
You can iterate through your unique column values, then filter your df2 based on each unique value. You don't need drop_duplicates if df1 values are unique, but I include it just in case in this toy example:
df1 = pd.DataFrame({'col_a':['aa', 'bb']})
df2 = pd.DataFrame({'col_a':['aa', 'aa', 'bb', 'bb'], 'other column':[1, 2, 3, 4]})
for col_value in df1['col_a'].drop_duplicates():
df_temp_filtered = df2[df2['col_a'] == col_value]
to get this df_temp_filtered in round 2:
col_a other column
2 bb 3
3 bb 4
Upvotes: 1