Excel-lit
Excel-lit

Reputation: 71

Filter one dataframe using another through a loop pandas

I have two dataframes df1 and df2. df1 as two columns with distinct values across multiple rows. It looks similar to the following:

col_a col_b
aa 50
bb 132
dd 543

df2 has similar structure to the following:

col_a col__b col_x col_y col_z
aa xy xy xy 2
aa xy xy xy 3
bb xy xy xy 14
bb xy xy xy 9
bb xy xy xy 6
cc xy xy xy 0
cc xy xy xy 2
dd xy xy xy 0
dd xy xy xy 2

I want to filter df2 through a loop using values from df1 in a way that after first iteration of loop I end up with the follow:

col_a col_b col_x col_y col_z
aa xy xy xy 2
aa xy xy xy 3

And after second iteration I should have the following:

col_a col_b col_x col_y col_z
bb xy xy xy 14
bb xy xy xy 9
bb xy xy xy 6

I do not want the filtering to modify the df2 permanently. Subsequent iterations should filter df2 based on values from col_a of df1.

I have no idea how to achieve this so I would appreciate any help.

Upvotes: 0

Views: 984

Answers (1)

Hammurabi
Hammurabi

Reputation: 1179

You can iterate through your unique column values, then filter your df2 based on each unique value. You don't need drop_duplicates if df1 values are unique, but I include it just in case in this toy example:

df1 = pd.DataFrame({'col_a':['aa', 'bb']})
df2 = pd.DataFrame({'col_a':['aa', 'aa', 'bb', 'bb'], 'other column':[1, 2, 3, 4]})

for col_value in df1['col_a'].drop_duplicates():
    df_temp_filtered = df2[df2['col_a'] == col_value]

to get this df_temp_filtered in round 2:

  col_a  other column
2    bb             3
3    bb             4

Upvotes: 1

Related Questions