Reputation: 386
we have below Pandas Dataframe
Desired output:
I want to do comma separated columns group by data
column and order by order
column
I tried below code but it orders on its own.
data= (df['section'].str.split(', ')
.groupby(df['data'])
.agg(lambda x: ', '.join(set(y for z in x for y in z)))
.reset_index())
Upvotes: 2
Views: 620
Reputation: 863166
Use DataFrame.sort_values
before groupby
and if need same order like original by data
column add DataFrame.reindex
by unique values created by Series.unique
:
df1 = (df.sort_values(['data','order'])
.groupby('data')['section']
.agg(','.join)
.reindex(df['data'].unique())
.reset_index())
print (df1)
data section
0 google code,reason,allergy,advance,service
1 Microsoft ab,ef,cd,gf
If possible data joined by ,
first sort data and then for unique values is used dictionary, because sets has not defined order:
print (df)
data section order
0 google code, code 1
1 google allergy 3
2 google reason 2
3 google service 5
4 google advance 4
5 Microsoft ab, ab, kl 1
6 Microsoft eh, eh, cd 3
7 Microsoft ef 2
8 Microsoft gf 4
df1 = df.sort_values(['data','order'])
data= (df1['section'].str.split(', ')
.groupby(df['data'])
.agg(lambda x: ', '.join(dict.fromkeys(y for z in x for y in z).keys()))
.reindex(df['data'].unique())
.reset_index()
)
print (data)
data section
0 google code, reason, allergy, advance, service
1 Microsoft ab, kl, ef, eh, cd, gf
Alternative for remove duplicates by DataFrame.explode
with DataFrame.drop_duplicates
:
df1 = df.sort_values(['data','order'])
data= (df1.assign(section= df1['section'].str.split(', '))
.explode('section')
.drop_duplicates(['data', 'section','order'])
.groupby('data')['section']
.agg(', '.join)
.reindex(df['data'].unique())
.reset_index()
)
print (data)
data section
0 google code, reason, allergy, advance, service
1 Microsoft ab, kl, ef, eh, cd, gf
Upvotes: 2