ramez
ramez

Reputation: 431

Merge duplicate rows and delete all duplicates after merging into one row

I have a pandas dataframe with multiple duplicate IDs as such:

 id                col1          col2      col3
ENE80R             N             N         Y
ENE80R             N             N         N
ENE80R             Y             N         N

What I'm trying to achieve is this:

 id                col1          col2      col3
ENE80R              Y             N         Y

I tried using df.drop_duplicates but it doesn't update all the Ys into one column, it just keeps the first row and removes the duplicates. Any idea how I can go about this?

EDIT: The data doesn't only contain Y and N in it, there are columns that contain data and other text and integer data. But the only data that differ in the duplicated data are the Ys and Ns.

Upvotes: 0

Views: 352

Answers (2)

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

Perhaps a simple max by groupby since Y is greater than N. i.e

df.groupby('id').max()

      col1 col2 col3
id                   
ENE80R    Y    N    Y

Or Thanks @pirSquared:

df.set_index('id').max(level=0)

Upvotes: 1

piRSquared
piRSquared

Reputation: 294218

You want to replace 'Y' for True and 'N' for False then do a groupby with any. Finally, clean it up by reversing the replace.

df.replace(dict(Y=True, N=False)) \
  .groupby('id').any() \
  .replace({True: 'Y', False: 'N'})

       col1 col2 col3
id                   
ENE80R    Y    N    Y

Alternatively, you could use pd.DataFrame.any with the level parameter.

df.set_index('id').eq('Y').any(level=0).replace({True: 'Y', False: 'N'})

       col1 col2 col3
id                   
ENE80R    Y    N    Y

Upvotes: 3

Related Questions