teller.py3
teller.py3

Reputation: 844

Access data after using groupby() with pandas

If I have this output:

AnnualReportDate  BoardID  Gender
20150901          2355     F         2
                           M         3
                  8888     F         2
                           M         3
20160901          5655     F         3
                           M         3
                  9999     M         6
dtype: int64

How can I calculate percentage F and M in each year and board? This is the output I'm trying to get:

20150901    2355    2/5    3/5
20150901    8888    2/5    3/5
20160901    5655    3/6    3/6
20160901    9999    0/6    1/6

I'm fairly new to pandas.

Upvotes: 0

Views: 51

Answers (3)

fuglede
fuglede

Reputation: 18201

If s is the name of your Series, you could do

df = s.unstack()
df.div(df.sum(1), 0)

In practice:

In [217]: s
Out[217]:
AnnualReportDate  BoardID  Gender
20150901          2355     F         2
                           M         3
                  8888     F         2
                           M         3
20160901          5655     F         3
                           M         3
                  9999     M         6
dtype: int64

In [218]: df = s.unstack()

In [219]: df.div(df.sum(1), 0)
Out[219]:
Gender                      F    M
AnnualReportDate BoardID
20150901         2355     0.4  0.6
                 8888     0.4  0.6
20160901         5655     0.5  0.5
                 9999     NaN  1.0

Upvotes: 1

BENY
BENY

Reputation: 323226

unstack and divided by sum

s1=s.unstack(fill_value=0)
s1.div(s1.sum(1),0)
Out[484]: 
Gender                      F    M
AnnualReportDate BoardID          
20150901         2355     0.4  0.6
                 8888     0.4  0.6
20160901         5655     0.5  0.5
                 9999     0.0  1.0

Upvotes: 3

Jake Morris
Jake Morris

Reputation: 645

Here's a good example, treating your series as a dataframe though:

from io import StringIO

df = pd.read_table(StringIO('''
AnnualReportDate  BoardID  Gender    val
20150901          2355     F         2
20150901          2355     M         3
20150901          8888     F         2
20150901          8888      M         3
20160901          5655     F         3
20160901          5655     M         3
20160901          9999     M         6
'''), delim_whitespace=True, index_col=['AnnualReportDate', 'BoardID', 'Gender'])

df.div(df.groupby(['AnnualReportDate','BoardID']).transform('sum')).unstack()

Output:

                    Gender  F       M
AnnualReportDate    BoardID     
20150901            2355    0.4     0.6
                    8888    0.4     0.6
20160901            5655    0.5     0.5
                    9999    NaN     1.0

Upvotes: 1

Related Questions