Kbbm
Kbbm

Reputation: 375

aggregation within a pivot table index

So let me try this again. I created a pivot table with the following:

df3.reset_index(inplace=True)
surveys = df3.groupby(['cohort','nps']).agg({'id': pd.Series.nunique})
surveys['%'] = surveys['id'] / surveys.id.sum()

which returned:

cohort    status       count     %
---------------------------------
2017-01 sad         188  0.009276
        ok           53  0.002615
        happy       253  0.012483
2017-02 sad         174  0.008585
        ok          113  0.005575
        happy       247  0.012187
2017-03 sad         221  0.010904
        ok          60   0.002960
        happy       299  0.014752

and so on.

I am trying to figure out how I can get the percentages to add up specifically for their respective cohort.

so ideally that would look like:

2017-01 sad         188  0.38
        ok          53   0.11
        happy       253  0.51

then I could unstack the nps and subtract the promoter from the detractor, ignoring the passive.

Has anyone here ever done something like this?

Upvotes: 1

Views: 40

Answers (2)

Andy Hayden
Andy Hayden

Reputation: 375475

You might like to use pivot_table instead:

In [11]: p = surveys.pivot_table("count", "cohort", "status", aggfunc="sum", fill_value=0)

In [12]: p
Out[12]:
status   happy   ok  sad
cohort
2017-01    253   53  188
2017-02    247  113  174
2017-03    299   60  221

In [13]: p.div(p.sum(axis=1), axis=0)
Out[13]:
status      happy        ok       sad
cohort
2017-01  0.512146  0.107287  0.380567
2017-02  0.462547  0.211610  0.325843
2017-03  0.515517  0.103448  0.381034

It depends what your next objective is, but this might be a better format...

Upvotes: 1

sacuL
sacuL

Reputation: 51335

Hard to say without seeing how you got your original data, but you can get the result you want by replacing your last line:

surveys['%'] = surveys['id'] / surveys.id.sum()

With:

surveys['%'] = surveys.groupby('cohort')['count'].transform(lambda x: x/sum(x))

For example, starting with your dataframe like:

>>> surveys
    cohort status  count
0  2017-01    sad    188
1  2017-01     ok     53
2  2017-01  happy    253
3  2017-02    sad    174
4  2017-02     ok    113
5  2017-02  happy    247
6  2017-03    sad    221
7  2017-03     ok     60
8  2017-03  happy    299

Then:

>>> surveys['%'] = surveys.groupby('cohort')['count'].transform(lambda x: x/sum(x))
>>> surveys
    cohort status  count         %
0  2017-01    sad    188  0.380567
1  2017-01     ok     53  0.107287
2  2017-01  happy    253  0.512146
3  2017-02    sad    174  0.325843
4  2017-02     ok    113  0.211610
5  2017-02  happy    247  0.462547
6  2017-03    sad    221  0.381034
7  2017-03     ok     60  0.103448
8  2017-03  happy    299  0.515517

Upvotes: 2

Related Questions