Reputation: 79
I have two DataFrames that are completely dissimilar except for certain values in one particular column:
df
First Last Email Age
0 Adam Smith [email protected] 30
1 John Brown [email protected] 35
2 Joe Max [email protected] 40
3 Will Bill [email protected] 25
4 Johnny Jacks [email protected] 50
df2
ID Location Contact
0 5435 Austin [email protected]
1 4234 Atlanta [email protected]
2 7896 Toronto [email protected]
How would I go about finding the matching values in the Email column of df and the Contact column of df2, and then dropping the whole row in df based on that match?
Output I'm looking for (index numbering doesn't matter):
df1
First Last Email Age
1 John Brown [email protected] 35
3 Will Bill [email protected] 25
I've been able to identify matches using a few different methods like:
Changing the column names to be identical
common = df.merge(df2,on=['Email'])
df3 = df[(~df['Email'].isin(common['Email']))]
But df3 still shows all the rows from df.
I've also tried:
common = df['Email'].isin(df2['Contact'])
df.drop(df[common].index, inplace = True)
And again, identifies the matches but df still contains all original rows.
So the main thing I'm having difficulty with is updating df with the matches dropped or creating a new DataFrame that contains only the rows with dissimilar values when comparing the Email column from df and the Contact column in df2. Appreciate any suggestions.
Upvotes: 0
Views: 68
Reputation: 815
As mentioned in the comments(@Arkadiusz), it is enough to filter your data using the following
df3 = df[(~df['Email'].isin(df2.Contact))].copy()
print(df3)
Upvotes: 1