Reputation: 1897
I'm new to Pandas. I created this pivot table, but I need to figure out how to apply a function within each day on the 'is_match' values only. See img below for head of data.
What I need is the % of values (reward_count) which is true for each day, per app (rows).
i.e. for date = '2015-10-22', total(true+false) = 59,101. % true would be 1,080/59,101 = 0.018%. For each date, I would just want to see this % true value in place of true/false counts.
original data:
date app_name is_match rewards_count
10/22/15 NFL HUDDLE 2016 FALSE 45816
10/22/15 NFL HUDDLE 2016 TRUE 1080
10/22/15 NFL HUDDLE 2016 FALSE 8
10/22/15 NFL HUDDLE 2016 FALSE 128239
10/23/15 NFL HUDDLE 2016 TRUE 908
10/23/15 NFL HUDDLE 2016 FALSE 18
10/24/15 NFL HUDDLE 2016 TRUE 638
The data frame:
table = pd.pivot_table(df, index=['app_name'],
columns=['date','is_match'],
values = 'rewards_count')
Thank you so much for your help. I have spend half the day looking through Pandas documentation but do not know what I'm looking for / what to reference.
Upvotes: 0
Views: 4051
Reputation: 85442
Using a multi index can help:
table = pd.pivot_table(apps, index=['app_name', 'date'],
columns=['is_match'],
values = 'rewards_count',
aggfunc=np.sum,
margins=True)
I sum up all counts with aggfunc=np.sum
and calculate the
sum of True
and False
with margins=True
.
These sums end up in All
:
is_match False True All
app_name date
NFL HUDDLE 2016 10/22/15 174063 1080 175143
10/23/15 18 908 926
10/24/15 79322 638 79960
All 253403 2626 256029
I add two new columns that hold the percentages:
table['percent_false'] = table[False] / table.All * 100
table['percent_true'] = table[True] / table.All * 100
The results looks like this:
is_match False True All percent_false percent_true
app_name date
NFL HUDDLE 2016 10/22/15 174063 1080 175143 99.383361 0.616639
10/23/15 18 908 926 1.943844 98.056156
10/24/15 79322 638 79960 99.202101 0.797899
All 253403 2626 256029 98.974335 1.025665
There is a lot of extra stuff in the table. Selecting only what you want:
percent_true = table.ix[:-1, ['percent_true']]
gives:
is_match percent_true
app_name date
NFL HUDDLE 2016 10/22/15 0.616639
10/23/15 98.056156
10/24/15 0.797899
If you want the mean of the counts, as you did in your approach,
don't use aggfunc=np.sum
. You also need to sum up by hand:
table = pd.pivot_table(apps, index=['app_name', 'date'],
columns=['is_match'],
values = 'rewards_count')
table['total'] = table[False] + table[True]
table['percent_false'] = table[False] / table.total * 100
table['percent_true'] = table[True] / table.total * 100
Now the result looks like this:
is_match False True total percent_false percent_true
app_name date
NFL HUDDLE 2016 10/22/15 58021 1080 59101 98.172620 1.827380
10/23/15 18 908 926 1.943844 98.056156
10/24/15 79322 638 79960 99.202101 0.797899
Again, select only the relevant parts:
percent_true = table[['percent_true']]
gives:
is_match percent_true
app_name date
NFL HUDDLE 2016 10/22/15 1.827380
10/23/15 98.056156
10/24/15 0.797899
Upvotes: 1