user8560167
user8560167

Reputation:

Erase columns where duplicated rows exist, in groups. Pandas

I need to show columns which have only duplicated rows inside, in Name groups I cannot remove/drop columns for one groupo because for other this specific column could be usefull. So when in specific column will be duplicates i need to make this column empty (replace with np.nan for example)

my df:

Name,B,C,D
Adam,20,dog,cat
Adam,20,cat,elephant
Katie,21,cat,cat
Katie,21,cat,dog
Brody,22,dog,dog
Brody,21,cat,dog

expected output:

#grouping by Name, always two Names are same, not less not more.
Name,B,C,D
Adam,np.nan,dog,cat
Adam,np.nan,cat,elephant
Katie,np.nan,np.nan,cat
Katie,np.nan,np.nan,dog
Brody,22,dog,np.nan
Brody,21,cat,np.nan

I know I should use groupby() function and duplicated() but I dont know how this approach should looks like

output=df[df.duplicated(keep=False)].groupby('Name')
output=output.replace({True:'np.nan'},regex=True)   

Upvotes: 2

Views: 40

Answers (1)

jezrael
jezrael

Reputation: 862571

Use GroupBy.transform with lambda function and DataFrame.mask for replace:

df = df.set_index('Name')
output=df.mask(df.groupby('Name').transform(lambda x: x.duplicated(keep=False))).reset_index()
print (output)
    Name     B    C         D
0   Adam   NaN  dog       cat
1   Adam   NaN  cat  elephant
2  Katie   NaN  NaN       cat
3  Katie   NaN  NaN       dog
4  Brody  22.0  dog       NaN
5  Brody  21.0  cat       NaN

Upvotes: 0

Related Questions