Reputation:
I need to show columns which have only duplicated rows inside, in Name
groups
I cannot remove/drop columns for one groupo because for other this specific column could be usefull.
So when in specific column will be duplicates i need to make this column empty (replace with np.nan for example)
my df:
Name,B,C,D
Adam,20,dog,cat
Adam,20,cat,elephant
Katie,21,cat,cat
Katie,21,cat,dog
Brody,22,dog,dog
Brody,21,cat,dog
expected output:
#grouping by Name, always two Names are same, not less not more.
Name,B,C,D
Adam,np.nan,dog,cat
Adam,np.nan,cat,elephant
Katie,np.nan,np.nan,cat
Katie,np.nan,np.nan,dog
Brody,22,dog,np.nan
Brody,21,cat,np.nan
I know I should use groupby()
function and duplicated()
but I dont know how this approach should looks like
output=df[df.duplicated(keep=False)].groupby('Name')
output=output.replace({True:'np.nan'},regex=True)
Upvotes: 2
Views: 40
Reputation: 862571
Use GroupBy.transform
with lambda function and DataFrame.mask
for replace:
df = df.set_index('Name')
output=df.mask(df.groupby('Name').transform(lambda x: x.duplicated(keep=False))).reset_index()
print (output)
Name B C D
0 Adam NaN dog cat
1 Adam NaN cat elephant
2 Katie NaN NaN cat
3 Katie NaN NaN dog
4 Brody 22.0 dog NaN
5 Brody 21.0 cat NaN
Upvotes: 0