Reputation: 1
I have two Pandas Dataframes (df1 and df2), both of which have identical structures, and both of which have several hundred thousand rows.
I'd like to update a field for each row indicating whether the ID for the row is found anywhere in a field in the other dataframe.
df1 = pd.DataFrame([['AAA',''],['BBB',''],['CCC','']], columns=['ID','Match'])
df2 = pd.DataFrame([['FFF',''],['BBB',''],['AAA',''],['BBB','']], columns=['ID','Match'])
And I'd like to end up with a result that looks like:
ID Match
FFF N
BBB Y
AAA Y
BBB Y
Upvotes: 0
Views: 41
Reputation:
You could join
the IDs in df1
and use str.contains
to identify the IDs that contains any ID from df1
; then use np.where
to assign "Y" if there is a match, "N" otherwise:
df2['Match'] = np.where(df2['ID'].str.contains('|'.join(df1['ID'])), 'Y', 'N')
Output:
ID Match
0 FFF N
1 BBB Y
2 AAA Y
3 BBB Y
Upvotes: 1