Reputation: 393
I am currently working with ratio's, but have hit a snag.
Below is a small sample of the data I'm working with [There are other platforms like Instagram, Messenger etc] to give the below more context:
Date Reach Impressions Clicks Landing_Page Platform
30/05/2017 27447 27939 90 68 Facebook
30/05/2017 24299 24318 80 44 Facebook
30/05/2017 9897 10081 33 25 Facebook
30/05/2017 11696 11721 33 21 Facebook
30/05/2017 53 55 1 Facebook
31/05/2017 46632 68757 213 143 Facebook
31/05/2017 67478 73401 650 424 Facebook
31/05/2017 38831 47577 136 77 Facebook
31/05/2017 46834 52449 135 77 Facebook
31/05/2017 273 531 12 10 Facebook
1/06/2017 48307 72141 221 150 Facebook
1/06/2017 64122 79501 202 106 Facebook
1/06/2017 66810 71033 843 575 Facebook
1/06/2017 46225 50003 138 76 Facebook
1/06/2017 496 1043 16 15 Facebook
I have then created a new column:
df["Click_To_Landing_Ratio] = df["Landing_Page] / df["Clicks"] * 100
I have tried the following:
round(df.pivot_table(index="Date", columns="Platform", values="Click_To_Landing_Ratio"), 3)
round(df.groupby(["Date", "Platform"], axis=0)["Click_To_Landing_Ratio"].mean().unstack(), 3)
When I Pivot this in excel (and create a calculated column), I get
Row Labels Facebook Grand Total
30/05/2017 66.67% 66.67%
31/05/2017 63.79% 63.79%
1/06/2017 64.93% 64.93%
2/06/2017 63.98% 63.98%
When I Pivot or Groupby this in Pandas, I get:
Row Labels Facebook Grand Total
30/05/2017 53.990% 53.990%
31/05/2017 65.871% 65.871%
1/06/2017 67.476% 67.476%
2/06/2017 64.031% 64.031%
I am guessing Pandas is averaging the row values for that day where as excel would be creating a total and dividing it by the total entries (if that makes sense).
In short, my questions are:
Any help would be greatly appreciated.
Upvotes: 0
Views: 282
Reputation: 7038
The issue here as you've properly identified is you're inadvertently averaging the daily Click to Landing values, whereas Excel is computing an average by summing up the values of Clicks and Landing Page visits per day and dividing.
You can do the same thing in pandas
by creating your pivot_table
and summing the values per day (notice I've passed aggfunc='sum'
to pivot_table
). You can then apply division to find the average:
df
Date Reach Impressions Clicks Landing_Page Platform
0 30/05/2017 27447 27939 90 68 Facebook
1 30/05/2017 24299 24318 80 44 Facebook
2 30/05/2017 9897 10081 33 25 Facebook
3 30/05/2017 11696 11721 33 21 Facebook
4 30/05/2017 53 55 1 0 Facebook
5 31/05/2017 46632 68757 213 143 Facebook
6 31/05/2017 67478 73401 650 424 Facebook
7 31/05/2017 38831 47577 136 77 Facebook
8 31/05/2017 46834 52449 135 77 Facebook
9 31/05/2017 273 531 12 10 Facebook
10 1/06/2017 48307 72141 221 150 Facebook
11 1/06/2017 64122 79501 202 106 Facebook
12 1/06/2017 66810 71033 843 575 Facebook
13 1/06/2017 46225 50003 138 76 Facebook
14 1/06/2017 496 1043 16 15 Facebook
t = df.pivot_table(index="Date", columns="Platform", values=['Clicks', 'Landing_Page'], aggfunc='sum')
Clicks Landing_Page
Platform Facebook Facebook
Date
1/06/2017 1420 922
30/05/2017 237 158
31/05/2017 1146 731
t[('Landing_Page', 'Facebook')].div(t[('Clicks', 'Facebook')]).apply(lambda x: '{:.2%}'.format(x))
Date
1/06/2017 64.90%
30/05/2017 66.70%
31/05/2017 63.80%
dtype: object
Upvotes: 1