Reputation: 316
I have a data frame which looks below
k={'ID':[1,2,3,4,5,6],'Name':['John Danny','Micheal K','John Danny','jerred','John Danny','joe'],'phone':['1111',
'2222','2233','1111','2222','6666']}
df=pd.DataFrame(data=k)
df
ID Name phone
1 John Danny 1111
2 Micheal K 2222
3 John Danny 2233
4 jerred 1111
5 John Danny 2222
I need to find the duplicated in name and phone in the data frame so used the below-given code
df[df['Name'].duplicated(keep=False)].sort_values("Name")
duplicated based on name
ID Name phone
1 John Danny 1111
3 John Danny 2233
5 John Danny 2222
duplicated based on phone
ID Name phone
1 John Danny 1111
4 jerred 1111
2 Micheal K 2222
5 John Danny 2222
ID Name phone duplicated of name ids duplicated of phone ids Duplicate_name Duplicate_phone
1 John Danny 1111 3,5 4 Yes Yes
2 Micheal K 2222 5 No Yes
3 John Danny 2233 1,5 Yes No
4 jerred 1111 1 No Yes
5 John Danny 2222 1,3 2 Yes Yes
I was able to find duplicate_name and duplicate_phone by using below code
df['Duplicate_name'] = df['Name'].duplicated(keep=False).map({True:'Yes', False:'No'})
df['Duplicate_phone'] = df['phone'].duplicated(keep=False).map({True:'Yes', False:'No'})
Upvotes: 2
Views: 1167
Reputation: 862581
Use GroupBy.transform
with custom function for subtract set
s:
def f(x):
return [', '.join(set(x) - set([y])) for y in x]
Or filter with if
in generator:
def f(x):
return [', '.join(z for z in x if z != y) for y in x]
df['duplicated of name ids'] = df['ID'].astype(str).groupby(df['Name']).transform(f)
df['duplicated of phone ids'] = df['ID'].astype(str).groupby(df['phone']).transform(f)
df['Duplicate_name'] = df['Name'].duplicated(keep=False).map({True:'Yes', False:'No'})
df['Duplicate_phone'] = df['phone'].duplicated(keep=False).map({True:'Yes', False:'No'})
print (df)
ID Name phone duplicated of name ids duplicated of phone ids \
0 1 John Danny 1111 5, 3 4
1 2 Micheal K 2222 5
2 3 John Danny 2233 5, 1
3 4 jerred 1111 1
4 5 John Danny 2222 1, 3 2
5 6 joe 6666
Duplicate_name Duplicate_phone
0 Yes Yes
1 No Yes
2 Yes No
3 No Yes
4 Yes Yes
5 No No
Upvotes: 2