Reputation: 541
I've got a set of data in pandas which has been grouped by two factors so I could sum over the groups separately. In other words:
grouped = df.groupby(['A','B'])['C'].sum()
Now, I would like to take the std
of this sum "across" B
so I can see how this deviation changes for the different A
values. How can I do this aggregate operation across one "dimension" or "index" of the grouped data?
I'm new to pandas, so this may be easy... but thanks for the help!
Upvotes: 1
Views: 4687
Reputation: 863216
It seems you need groupby
with parameter level
:
grouped = df.groupby(['A','B'])['C'].sum().groupby(level='B').std()
Sample:
np.random.seed(100)
df = pd.DataFrame(np.random.randint(5, size=(10,3)), columns=list('ABC'))
print (df)
A B C
0 0 0 3
1 0 2 4
2 2 2 2
3 2 1 0
4 0 4 3
5 4 2 0
6 3 1 2
7 3 4 4
8 1 3 4
9 4 3 3
grouped = df.groupby(['A','B'])['C'].sum().groupby(level='B').std().reset_index()
print (grouped)
B C
0 0 NaN
1 1 1.414214
2 2 2.000000
3 3 0.707107
4 4 0.707107
grouped = df.groupby(['A','B'])['C'].sum().groupby(level=1).std().reset_index()
print (grouped)
B C
0 0 NaN
1 1 1.414214
2 2 2.000000
3 3 0.707107
4 4 0.707107
Explanation, per partes:
#groupby by columns A, B, aggregate column C
#->output is Series with MultiIndex
grouped1 = df.groupby(['A','B'])['C'].sum()
print (grouped1)
A B
0 0 3
2 4
4 3
1 3 4
2 1 0
2 2
3 1 2
4 4
4 2 0
3 3
Name: C, dtype: int32
print (type(grouped1))
<class 'pandas.core.series.Series'>
print (grouped1.index)
MultiIndex(levels=[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]],
labels=[[0, 0, 0, 1, 2, 2, 3, 3, 4, 4], [0, 2, 4, 3, 1, 2, 1, 4, 2, 3]],
names=['A', 'B'])
#groupby by level B of MultiIndex
#->output is Series with MultiIndex, so reset_index for df
grouped = grouped1.groupby(level='B').std().reset_index()
print (grouped)
B C
0 0 NaN
1 1 1.414214
2 2 2.000000
3 3 0.707107
4 4 0.707107
#all together
grouped = df.groupby(['A','B'])['C'].sum().groupby(level='B').std().reset_index()
print (grouped)
B C
0 0 NaN
1 1 1.414214
2 2 2.000000
3 3 0.707107
4 4 0.707107
Upvotes: 1