Reputation: 184
I've encountered a problem which I didn't succeed to solve, for now. Your assistance on this one would be highly appreciated.
I have 2 dataframes:
first_df:
A B C D
1 1 a q zz
2 2 b w xx
3 3 c e yy
4 4 d r vv
second_df:
C1 C2
1 10 a
2 20 b
3 70 g
4 80 h
What I want to achieve is mark rows from first_df, based on values from second_df. Marking those rows should be based on comparing column values:
(first_df['A'] == second_df['C1'] * 10) & (first_df['B'] == second_df['C2'])
Expected output should be like this:
compared_df:
A B C D Match
1 1 a q zz True
2 2 b w xx True
3 3 c e yy False
4 4 d r vv False
Could you please point me in a right direction? Which pandas methods should I use to achieve my goal? I've tried many things, but I'm pandas beginner, so it's hard to assess if those attempts were correct.
Upvotes: 0
Views: 1072
Reputation: 71689
First create Multiindex on both the dataframes then use MultiIndex.isin
to test for the occurrence of the index values of first dataframe in the index of second dataframe in order the create boolean flag:
i1 = first_df.set_index([first_df['A'] * 10, 'B']).index
i2 = second_df.set_index(['C1', 'C2']).index
first_df['Match'] = i1.isin(i2)
Result
print(first_df)
A B C D Match
1 1 a q zz True
2 2 b w xx True
3 3 c e yy False
4 4 d r vv False
Upvotes: 1
Reputation: 1195
If you don't mind writing it as sql code and you can import libraries to your project, you might use pandasql library.
import pandasql as psql
compared_df = psql.sqldf("""
select f.A, f.B, f.C, f.D,
CASE WHEN s.C1 is not null then 1 else 0 end as Match
FROM first_df f
LEFT JOIN second_df s ON (f.A *10 = s.C1 AND f.B = s.C2)
""")
Upvotes: 0
Reputation:
The code you wrote is almost completely correct; I think you just multiplied the wrong column by 10
:
first_df['Match'] = (first_df['A'] * 10 == second_df['C1']) & (first_df['B'] == second_df['C2'])
Output:
>>> first_df
A B C D Match
1 1 a q zz True
2 2 b w xx True
3 3 c e yy False
4 4 d r vv False
Upvotes: 0