Reputation: 596
My data is as follow :
set_name | cases | label
0 {a,b,c} 7 'a b c'
1 {j,i,a} 2 'j i a'
2 {a,c} 1 'a c'
3 {b,c,a} 2 'b c a'
i would like to groupby on set_name with sum on cases . but - i need the label to correspond to the row with max cases . so the output will look like :
set_name | cases | label
0 {a,b,c} 9 'a b c'
1 {j,i,a} 2 'j i a'
2 {a,c} 1 'a c'
Since both {a,b,c} and {b,c,a} are equal (they are set) i want to sum them up. i've tried using the solution suggested here but since groupby cant work on sets i cant proceed with it
Upvotes: 1
Views: 58
Reputation: 862396
Becuase set
s are unhashable one possible solution is use frozenset
s and aggregate by GroupBy.agg
:
df1 = (df.sort_values(by='cases', ascending=False)
.groupby(df['set_name'].apply(frozenset))
.agg({'set_name': 'first',
'cases':'sum',
'label':'first'})
.reset_index(drop=True))
print (df1)
set_name cases label
0 {a, c, b} 9 a b c
1 {i, j, a} 2 j i a
2 {a, c} 1 a c
Upvotes: 2
Reputation: 19947
(
df.sort_values(by='cases', ascending=False)
.groupby(df.set_name.apply(sorted).apply(str))
.agg(set_name=('set_name', 'first'),
cases=('cases', 'sum'),
lalbel=('label', 'first'),
)
.reset_index(drop=True)
)
Upvotes: 1