Skipper Lin
Skipper Lin

Reputation: 179

Populate a new dataframe column with True if two cell values match another smaller subset dataframe in pandas

I am looking to populate a new dataframe column with True if two cell values match another smaller subset dataframe in pandas, otherwise with a value of False.

For instance, this is original output dataframe I am constructing.

ID    Type
1     A
2     B
3     A
4     A
5     C
6     A
7     D
8     A
9     B
10    A

And the smaller subset of the dataframe selected based on some criteria:

ID    Type
1     A
3     A
4     A
5     C
7     D
10    A

What I am trying to accomplish is when ID and Type in the output dataframe match with the smaller subset datadrame, I want to populate a new column called 'Result' and value equals to True. Otherwise, value equals to False.

ID    Type    Result
1     A       True
2     B       False
3     A       True
4     A       True
5     C       True
6     A       False
7     D       True
8     A       False
9     B       False
10    A       True

Upvotes: 0

Views: 273

Answers (1)

SeaBean
SeaBean

Reputation: 23217

You can .merge() the 2 dataframes using a left merge with the original dataframe as base and turn on the indicator= parameter to show the merge result. Then change the merge result to True for the rows that appear in both dataframes and False otherwise.

df_out = df1.merge(df2, on=['ID', 'Type'] , how='left', indicator='Result')
df_out['Result'] = (df_out['Result'] == 'both')

Explanation:

With indicator= parameter turn on, Pandas will show you the merge result of which dataframe the current row are from (in terms of both, left_only and right_only)

df_out = df1.merge(df2, on=['ID', 'Type'] , how='left', indicator='Result')

print(df_out)


   ID Type     Result
0   1    A       both
1   2    B  left_only
2   3    A       both
3   4    A       both
4   5    C       both
5   6    A  left_only
6   7    D       both
7   8    A  left_only
8   9    B  left_only
9  10    A       both

Then, we transform the both and others to True/False by boolean mask, as follows:

df_out['Result'] = (df_out['Result'] == 'both')

print(df_out)



   ID Type  Result
0   1    A    True
1   2    B   False
2   3    A    True
3   4    A    True
4   5    C    True
5   6    A   False
6   7    D    True
7   8    A   False
8   9    B   False
9  10    A    True

Upvotes: 1

Related Questions