ah2Bwise
ah2Bwise

Reputation: 152

Numpy isin() is not returning expected result

Based on the code below, I would expect the first element of the 'duplicate' column to return 'True' since it exists in 'df_set'. This is for a much larger data-set, hence the use of converting to a set...

What am I doing incorrectly that is causing the first element of 'duplicate' to return 'False?

import numpy as np
import pandas as pd

data = [
    ['tom', 'juli'],
    ['nick', 'heather'],
    ['juli', 'john'],
    ['dustin', 'tracy']
]
columns = ['Name', 'Name2']

df = pd.DataFrame(data, columns = columns)
df_set = set(df['Name'])
df['duplicate'] = np.isin(df['Name2'], df_set, assume_unique=True)
print(df)

Output:

     Name    Name2  duplicate
0     tom     juli      False
1    nick  heather      False
2    juli     john      False
3  dustin    tracy      False

Upvotes: 0

Views: 1458

Answers (2)

wwnde
wwnde

Reputation: 26686

Another way, could still evaluate within df;

df['duplicate'] =df['Name2'].isin(set(df['Name']))



     Name    Name2  duplicate
0     tom     juli       True
1    nick  heather      False
2    juli     john      False
3  dustin    tracy      False

Upvotes: 0

user17242583
user17242583

Reputation:

numpy doesn't seem to like sets, so you should convert the set back to a list:

df['duplicate'] = np.isin(df['Name2'], list(df_set), assume_unique=True)

Output:

>>> df
     Name    Name2  duplicate
0     tom     juli       True
1    nick  heather      False
2    juli     john      False
3  dustin    tracy      False

Upvotes: 5

Related Questions