Reputation: 10697
df = pd.DataFrame({'a':['y',NaN,'y',NaN,NaN,'x','x','y',NaN],'b':[NaN,'x',NaN,'y','x',NaN,NaN,NaN,'y'],'d':[1,0,0,1,1,1,0,1,0]})
I'm trying to summarize this dataframe using sum
. I thought df.groupby(['a','b']).aggregate(sum)
would work but it returns an empty Series
.
How can I achieve this result?
a b
x 1 1
y 2 1
Upvotes: 0
Views: 2064
Reputation: 879103
import numpy as np
import pandas as pd
NaN = np.nan
df = pd.DataFrame(
{'a':['y',NaN,'y',NaN,NaN,'x','x','y',NaN],
'b':[NaN,'x',NaN,'y','x',NaN,NaN,NaN,'y'],
'd':[32,12,55,98,23,11,9,91,3]})
melted = pd.melt(df, id_vars=['d'], value_vars=['a', 'b'])
result = pd.pivot_table(melted, values='d', index=['value'], columns=['variable'],
aggfunc=np.median)
print(result)
yields
variable a b
value
x 10.0 17.5
y 55.0 50.5
Explanation:
Melting the DataFrame with melted = pd.melt(df, value_vars=['a', 'b'])
produces
d variable value
0 32 a y
1 12 a NaN
2 55 a y
3 98 a NaN
4 23 a NaN
5 11 a x
6 9 a x
7 91 a y
8 3 a NaN
9 32 b NaN
10 12 b x
11 55 b NaN
12 98 b y
13 23 b x
14 11 b NaN
15 9 b NaN
16 91 b NaN
17 3 b y
and now we can use pd.pivot_table
to pivot and aggregate the d
values:
result = pd.pivot_table(melted, values='d', index=['value'], columns=['variable'],
aggfunc=np.median)
Note that the aggfunc
can take a list of functions, such as [np.sum, np.median, np.min, np.max, np.std]
if you wish to summarize the data in more than one way.
Upvotes: 2