Reputation: 11

Pandas filtering rows in one dataframe based on values in another data frame

New to pandas and trying to figure out filtering results from a dataframe based on data in another dataframe.

df1                   df2
v1  v2                v1   v2
1    2                1    3
1    3                2    3
1    4                3    4
2    2                
2    4
3    2
3    3   
3    4

I have two dataframes: df1 and df2. I'm trying to create df3 which filters off results where if df1.v1 = df2.v1 and df1.v2 < df2.v2. This would yield df3 here:

I'm not sure how to go about doing this. I'm thinking of maybe converting df2 to a dictionary (df.v1 are all unique).

Upvotes: 0

Answers (3)

Corralien

Reputation: 120399

You can merge your 2 dataframes on v1 and filter out:

df3 = df1.merge(df2, on='v1', suffixes=('_', '')) \
         .query("v2 <= v2_").drop(columns='v2_')
print(df3)

# Output
   v1  v2
1   1   3
2   1   3
4   2   3
7   3   4

Update

In case you don't have a strict equality between v1 columns, you can merge with how='cross' to create all combinations:

df3 = df1.merge(df2, how='cross', suffixes=('_', '')) \
         .query("(v1 == v1_) & (v2 <= v2_)").drop(columns=['v1_', 'v2_'])
print(df3)

# Output
    v1  v2
3    1   3
6    1   3
13   2   3
23   3   4

Upvotes: 2

user7864386

Reputation:

merge on "v1" and use a boolean mask to filter:

merged_df = df1.merge(df2, on='v1', suffixes=('_df1', ''))
merged_df.loc[merged_df['v2_df1'] < merged_df['v2'], ['v1', 'v2']]

Output:

Upvotes: 3

mozway

Reputation: 260455

You can merge on v1, then filter on the v2 values and cleanup the intermediate column:

(df1.merge(df2, on='v1', suffixes=('_', ''))
    .query('v2 <= v2_')
    .drop(columns='v2_')
)

output:

Upvotes: 2

Pandas filtering rows in one dataframe based on values in another data frame

Answers (3)

Related Questions