Reputation: 123
I have data that I want to groupby and sum along a multi-index, but then also have the sum of just the first part of the multi-index. So the idea would be the index would be the multiindex, and then I could see both the size of that and the size of just its first group. So for example I have a dataframe like:
d = {'attr_1': ['A','A','A','B','B'], 'attr_2': ['a','a','b','a','b'],'val': [1,2,3,4,5]}
df = pd.DataFrame(d)
df
a_1 a_2 val
0 A a 1
1 A a 2
2 A b 3
3 B a 4
4 B b 5
What I would like to make is:
val sum_a1
a_1 a_2
A a 3 6
A b 3 6
B a 4 9
B b 5 9
I'm looking for a way to do this cleanly. My current approach is to make a new dataframe, turn it into a dictionary, reset the index, and map the dictionary, but it seems like there should be an easier and more pythonic way to do this.
a = df.groupby(['a_1','a_2']).sum()
b = df.groupby(['a_1']).sum()
my_dict = b.to_dict()['val']
a = a.reset_index()
a['sum_a1'] = a.a_1.map(my_dict)
a
a_1 a_2 val sum_a1
0 A a 3 6
1 A b 3 6
2 B a 4 9
3 B b 5 9
Any help would be greatly appreciated. Apologies if this is a repeated question I searched and couldn't find anything that was quite the same.
Upvotes: 1
Views: 64
Reputation: 294258
With set_index
and sum
letting join
align the indices.
df.set_index(['attr_1', 'attr_2']).pipe(
lambda d: d.sum(level=[0, 1]).join(d.val.rename('sum_a1').sum(level=0))
)
val sum_a1
attr_1 attr_2
A a 3 6
b 3 6
B a 4 9
b 5 9
Upvotes: 1
Reputation: 323226
You are looking for transform
a = df.groupby(['attr_1','attr_2']).sum()
a['sum_a1']=a.groupby(level=0)['val'].transform('sum')
a
Out[14]:
val sum_a1
attr_1 attr_2
A a 3 6
b 3 6
B a 4 9
b 5 9
Upvotes: 2