Reputation: 458
I am trying to combine rows of a dataframe in the event that there is a duplicate in one column. The dataframe looks like the following.
Name Code X Y
A 123 10 11
B 456 12 13
C 123 15 16
I want to combine on Code. So if the Code is the same, combine the other data separated by a comma. The resulting df would look like this:
Name Code X Y
A,C 123 10,15 11,16
B 456 12 13
My approach was the following:
df = df.groupby(['Name','Code','Y'])['X'].astype(str).apply(', '.join).reset_index()
df = df.groupby(['Name','Code','X'])['Y'].astype(str).apply(', '.join).reset_index()
I get the following error :
"Cannot access callable attribute 'astype' of 'SeriesGroupBy' objects, try using the 'apply' method"
I have been unable to figure out how to use apply to cast as type str, any tips?
Upvotes: 3
Views: 81
Reputation: 863
More general solution:
import pandas as pd
df = pd.DataFrame([['A',123,10,11],['B',456,12,13],['C',123,15,16]],columns=['Name','Code','X','Y'])
def f(x):
return ','.join(x)
df = df.astype(str).groupby('Code').agg(f)
df.index = [int(i) for i in df.index.tolist()]
The last line changes the index from string
back to integer
type.
I added this solution because it is easy to understand, however not the most elegant.
Upvotes: 1
Reputation: 862661
Create index from Code
column fo avoid casting to strings, then cast all columns and aggregate by index function join
:
df = df.set_index('Code').astype(str).groupby(level=0).agg(', '.join).reset_index()
#pandas 0.24+
#df = df.set_index('Code').astype(str).groupby('Code').agg(', '.join).reset_index()
print (df)
Code Name X Y
0 123 A, C 10, 15 11, 16
1 456 B 12 13
Upvotes: 5
Reputation: 323236
Another layout
(df.set_index('Code').astype(str)+',').sum(level=0).apply(lambda x : x.str.strip(','))
Out[50]:
Name X Y
Code
123 A,C 10,15 11,16
456 B 12 13
Upvotes: 3