How to Remove the Data in a Specific Column for the Duplicate IDs?

Question

I have this simple dataframe:

ID  Name    State
1   John    DC
1   John    VA
2   Smith   NE
3   Janet   CA
3   Janet   NC
3   Janet   MD

I want to delete the State value for the duplicate IDs like so:

ID  Name    State
1   John    nan
1   John    nan
2   Smith   NE
3   Janet   nan
3   Janet   nan
3   Janet   nan

Any idea how to solve this problem?

Thanks,

piRSquared · Accepted Answer

duplicated returns a boolean mask where rows are duplicated over the columns defined in subset. keep=False indicates that we shouldn't consider the first or last of the duplicates as non-duplicate. Using loc then allows us to assign to the rows where duplicates happen.

df.loc[df.duplicated(subset=['ID'], keep=False), 'State'] = None

df

How to Remove the Data in a Specific Column for the Duplicate IDs?

Answers (2)

Related Questions