Reputation: 133
I have two data sets
1 set it has a column with a list of email address:
DF1
Email
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
2nd csv Dataframe2
Email
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
import pandas as pd
SansList = r'C:\\Sans compare\\SansList.csv'
AllUsers = r'C:\\Sans compare\\AllUser.csv'
## print Name column only and turn into data sets from CSV ##
df1 = pd.read_csv(SansList, usecols=[0])
df2 = pd.read_csv(AllUsers, usecols=[2])
**print(df1['Email'].isin(df2)==False)**
I want the results to be,
Dataframe3
[email protected]
[email protected]
[email protected]
Not quite sure how to fix my dataset... :(
Upvotes: 3
Views: 1223
Reputation: 210852
Numpy solution:
In [311]: df2[~np.in1d(df2.Email, df1.Email)]
Out[311]:
Email
4 [email protected]
5 [email protected]
6 [email protected]
Upvotes: 1
Reputation: 294318
Option 1
isin
df2[~df2.Email.isin(df1.Email)]
Email
4 [email protected]
5 [email protected]
6 [email protected]
Option 2
query
df2.query('Email not in @df1.Email')
Email
4 [email protected]
5 [email protected]
6 [email protected]
Option 3
merge
pd.DataFrame.merge
with indicator=True
, enables you to see which dataframe the row came from. We can then filter on it.
df2.merge(
df1, 'outer', indicator=True
).query('_merge == "left_only"').drop('_merge', 1)
Email
20 [email protected]
21 [email protected]
22 [email protected]
Upvotes: 1