Reputation: 844
If I have this output:
AnnualReportDate BoardID Gender
20150901 2355 F 2
M 3
8888 F 2
M 3
20160901 5655 F 3
M 3
9999 M 6
dtype: int64
How can I calculate percentage F and M in each year and board? This is the output I'm trying to get:
20150901 2355 2/5 3/5
20150901 8888 2/5 3/5
20160901 5655 3/6 3/6
20160901 9999 0/6 1/6
I'm fairly new to pandas.
Upvotes: 0
Views: 51
Reputation: 18201
If s
is the name of your Series
, you could do
df = s.unstack()
df.div(df.sum(1), 0)
In practice:
In [217]: s
Out[217]:
AnnualReportDate BoardID Gender
20150901 2355 F 2
M 3
8888 F 2
M 3
20160901 5655 F 3
M 3
9999 M 6
dtype: int64
In [218]: df = s.unstack()
In [219]: df.div(df.sum(1), 0)
Out[219]:
Gender F M
AnnualReportDate BoardID
20150901 2355 0.4 0.6
8888 0.4 0.6
20160901 5655 0.5 0.5
9999 NaN 1.0
Upvotes: 1
Reputation: 323226
unstack
and divided by sum
s1=s.unstack(fill_value=0)
s1.div(s1.sum(1),0)
Out[484]:
Gender F M
AnnualReportDate BoardID
20150901 2355 0.4 0.6
8888 0.4 0.6
20160901 5655 0.5 0.5
9999 0.0 1.0
Upvotes: 3
Reputation: 645
Here's a good example, treating your series as a dataframe though:
from io import StringIO
df = pd.read_table(StringIO('''
AnnualReportDate BoardID Gender val
20150901 2355 F 2
20150901 2355 M 3
20150901 8888 F 2
20150901 8888 M 3
20160901 5655 F 3
20160901 5655 M 3
20160901 9999 M 6
'''), delim_whitespace=True, index_col=['AnnualReportDate', 'BoardID', 'Gender'])
df.div(df.groupby(['AnnualReportDate','BoardID']).transform('sum')).unstack()
Output:
Gender F M
AnnualReportDate BoardID
20150901 2355 0.4 0.6
8888 0.4 0.6
20160901 5655 0.5 0.5
9999 NaN 1.0
Upvotes: 1