MaxB
MaxB

Reputation: 458

Combine duplicate rows on specific column

I am trying to combine rows of a dataframe in the event that there is a duplicate in one column. The dataframe looks like the following.

Name   Code   X   Y
 A     123   10   11
 B     456   12   13
 C     123   15   16

I want to combine on Code. So if the Code is the same, combine the other data separated by a comma. The resulting df would look like this:

Name   Code    X       Y
A,C    123   10,15   11,16
 B     456    12       13

My approach was the following:

    df = df.groupby(['Name','Code','Y'])['X'].astype(str).apply(', '.join).reset_index() 

    df = df.groupby(['Name','Code','X'])['Y'].astype(str).apply(', '.join).reset_index() 

I get the following error :

"Cannot access callable attribute 'astype' of 'SeriesGroupBy' objects, try using the 'apply' method"

I have been unable to figure out how to use apply to cast as type str, any tips?

Upvotes: 3

Views: 81

Answers (3)

Sokolokki
Sokolokki

Reputation: 863

More general solution:

import pandas as pd
df = pd.DataFrame([['A',123,10,11],['B',456,12,13],['C',123,15,16]],columns=['Name','Code','X','Y'])

def f(x):
    return ','.join(x)

df = df.astype(str).groupby('Code').agg(f)
df.index = [int(i) for i in df.index.tolist()] 

The last line changes the index from string back to integer type. I added this solution because it is easy to understand, however not the most elegant.

Upvotes: 1

jezrael
jezrael

Reputation: 862661

Create index from Code column fo avoid casting to strings, then cast all columns and aggregate by index function join:

df = df.set_index('Code').astype(str).groupby(level=0).agg(', '.join).reset_index()
#pandas 0.24+
#df = df.set_index('Code').astype(str).groupby('Code').agg(', '.join).reset_index()
print (df)
   Code  Name       X       Y
0   123  A, C  10, 15  11, 16
1   456     B      12      13

Upvotes: 5

BENY
BENY

Reputation: 323236

Another layout

(df.set_index('Code').astype(str)+',').sum(level=0).apply(lambda x : x.str.strip(','))
Out[50]: 
     Name      X      Y
Code                   
123   A,C  10,15  11,16
456     B     12     13

Upvotes: 3

Related Questions