Pandas Column Calculations in PivotTable

Question

I'm new to Pandas. I created this pivot table, but I need to figure out how to apply a function within each day on the 'is_match' values only. See img below for head of data.

What I need is the % of values (reward_count) which is true for each day, per app (rows).

i.e. for date = '2015-10-22', total(true+false) = 59,101. % true would be 1,080/59,101 = 0.018%. For each date, I would just want to see this % true value in place of true/false counts.

original data:

date    app_name    is_match    rewards_count
10/22/15    NFL HUDDLE 2016 FALSE   45816
10/22/15    NFL HUDDLE 2016 TRUE    1080
10/22/15    NFL HUDDLE 2016 FALSE   8
10/22/15    NFL HUDDLE 2016 FALSE   128239
10/23/15    NFL HUDDLE 2016 TRUE    908
10/23/15    NFL HUDDLE 2016 FALSE   18
10/24/15    NFL HUDDLE 2016 TRUE    638

The data frame:

table = pd.pivot_table(df, index=['app_name'], 
                       columns=['date','is_match'],
                       values = 'rewards_count')

sample data

Thank you so much for your help. I have spend half the day looking through Pandas documentation but do not know what I'm looking for / what to reference.

Mike M&#252;ller · Accepted Answer

Using a multi index can help:

table = pd.pivot_table(apps, index=['app_name', 'date'], 
                       columns=['is_match'],
                       values = 'rewards_count',
                       aggfunc=np.sum,
                       margins=True)

I sum up all counts with aggfunc=np.sum and calculate the sum of True and False with margins=True. These sums end up in All:

is_match                   False  True     All
app_name        date                          
NFL HUDDLE 2016 10/22/15  174063  1080  175143
                10/23/15      18   908     926
                10/24/15   79322   638   79960
All                       253403  2626  256029

I add two new columns that hold the percentages:

table['percent_false']  = table[False] / table.All * 100
table['percent_true']  = table[True] / table.All * 100

The results looks like this:

is_match                   False  True     All  percent_false  percent_true
app_name        date                                                       
NFL HUDDLE 2016 10/22/15  174063  1080  175143      99.383361      0.616639
                10/23/15      18   908     926       1.943844     98.056156
                10/24/15   79322   638   79960      99.202101      0.797899
All                       253403  2626  256029      98.974335      1.025665

There is a lot of extra stuff in the table. Selecting only what you want:

percent_true = table.ix[:-1, ['percent_true']]

gives:

is_match                  percent_true
app_name        date                  
NFL HUDDLE 2016 10/22/15      0.616639
                10/23/15     98.056156
                10/24/15      0.797899

If you want the mean of the counts, as you did in your approach, don't use aggfunc=np.sum. You also need to sum up by hand:

table = pd.pivot_table(apps, index=['app_name', 'date'], 
                       columns=['is_match'],
                       values = 'rewards_count')
table['total'] = table[False] + table[True]
table['percent_false']  = table[False] / table.total * 100
table['percent_true']  = table[True] / table.total * 100

Now the result looks like this:

is_match                  False  True  total  percent_false  percent_true
app_name        date                                                     
NFL HUDDLE 2016 10/22/15  58021  1080  59101      98.172620      1.827380
                10/23/15     18   908    926       1.943844     98.056156
                10/24/15  79322   638  79960      99.202101      0.797899

Again, select only the relevant parts:

percent_true = table[['percent_true']]

gives:

is_match                  percent_true
app_name        date                  
NFL HUDDLE 2016 10/22/15      1.827380
                10/23/15     98.056156
                10/24/15      0.797899

Pandas Column Calculations in PivotTable

Answers (1)

Related Questions