vka
vka

Reputation: 316

how to display duplicate ID along with duplicate data in the python pandas

I have a data frame which looks below

k={'ID':[1,2,3,4,5,6],'Name':['John Danny','Micheal K','John Danny','jerred','John Danny','joe'],'phone':['1111',
                                                                                   '2222','2233','1111','2222','6666']}
df=pd.DataFrame(data=k)
df
    ID  Name       phone
    1   John Danny  1111
    2   Micheal K   2222
    3   John Danny  2233
    4   jerred      1111
    5   John Danny  2222

I need to find the duplicated in name and phone in the data frame so used the below-given code

df[df['Name'].duplicated(keep=False)].sort_values("Name")

duplicated based on name


ID  Name       phone
1   John Danny  1111
3   John Danny  2233
5   John Danny  2222

duplicated based on phone

    ID  Name       phone
    1   John Danny  1111
    4   jerred      1111
    2   Micheal K   2222
    5   John Danny  2222

but I want the result as follows

ID  Name      phone duplicated of name ids  duplicated of phone ids Duplicate_name  Duplicate_phone
1   John Danny  1111    3,5                    4                    Yes              Yes
2   Micheal K   2222                           5                     No              Yes
3   John Danny  2233    1,5                                          Yes              No
4   jerred      1111                           1                     No              Yes
5   John Danny  2222    1,3                    2                     Yes             Yes

I was able to find duplicate_name and duplicate_phone by using below code

df['Duplicate_name'] = df['Name'].duplicated(keep=False).map({True:'Yes', False:'No'})
df['Duplicate_phone'] = df['phone'].duplicated(keep=False).map({True:'Yes', False:'No'})

The problem is not able display ID in duplicated of phone ids and duplicated of name ids as above given result table how to do it?

Upvotes: 2

Views: 1167

Answers (1)

jezrael
jezrael

Reputation: 862581

Use GroupBy.transform with custom function for subtract sets:

def f(x):
    return [', '.join(set(x) - set([y])) for y in x]

Or filter with if in generator:

def f(x):
    return [', '.join(z for z in x if z != y) for y in x]

df['duplicated of name ids'] = df['ID'].astype(str).groupby(df['Name']).transform(f)
df['duplicated of phone ids'] = df['ID'].astype(str).groupby(df['phone']).transform(f)


df['Duplicate_name'] = df['Name'].duplicated(keep=False).map({True:'Yes', False:'No'})
df['Duplicate_phone'] = df['phone'].duplicated(keep=False).map({True:'Yes', False:'No'})
print (df)
   ID        Name phone duplicated of name ids duplicated of phone ids  \
0   1  John Danny  1111                   5, 3                       4   
1   2   Micheal K  2222                                              5   
2   3  John Danny  2233                   5, 1                           
3   4      jerred  1111                                              1   
4   5  John Danny  2222                   1, 3                       2   
5   6         joe  6666                                                  

  Duplicate_name Duplicate_phone  
0            Yes             Yes  
1             No             Yes  
2            Yes              No  
3             No             Yes  
4            Yes             Yes  
5             No              No  

Upvotes: 2

Related Questions