AdrianC
AdrianC

Reputation: 393

Python | Groupby | Pivot | Using Percentages

I am currently working with ratio's, but have hit a snag.

Below is a small sample of the data I'm working with [There are other platforms like Instagram, Messenger etc] to give the below more context:

Date        Reach   Impressions Clicks  Landing_Page    Platform
30/05/2017  27447   27939       90      68              Facebook
30/05/2017  24299   24318       80      44              Facebook
30/05/2017  9897    10081       33      25              Facebook
30/05/2017  11696   11721       33      21              Facebook
30/05/2017  53      55          1                       Facebook
31/05/2017  46632   68757       213     143             Facebook
31/05/2017  67478   73401       650     424             Facebook
31/05/2017  38831   47577       136     77              Facebook
31/05/2017  46834   52449       135     77              Facebook
31/05/2017  273     531         12      10              Facebook
1/06/2017   48307   72141       221     150             Facebook
1/06/2017   64122   79501       202     106             Facebook
1/06/2017   66810   71033       843     575             Facebook
1/06/2017   46225   50003       138     76              Facebook
1/06/2017   496     1043        16      15              Facebook

I have then created a new column:

df["Click_To_Landing_Ratio] = df["Landing_Page] / df["Clicks"] * 100

I have tried the following:

round(df.pivot_table(index="Date", columns="Platform", values="Click_To_Landing_Ratio"), 3)

round(df.groupby(["Date", "Platform"], axis=0)["Click_To_Landing_Ratio"].mean().unstack(), 3)

When I Pivot this in excel (and create a calculated column), I get

Row Labels  Facebook    Grand Total
30/05/2017  66.67%      66.67%
31/05/2017  63.79%      63.79%
1/06/2017   64.93%      64.93%
2/06/2017   63.98%      63.98%

When I Pivot or Groupby this in Pandas, I get:

Row Labels  Facebook    Grand Total
30/05/2017  53.990%     53.990% 
31/05/2017  65.871%     65.871% 
1/06/2017   67.476%     67.476%
2/06/2017   64.031%     64.031%

I am guessing Pandas is averaging the row values for that day where as excel would be creating a total and dividing it by the total entries (if that makes sense).

In short, my questions are:

  1. If you have ratios (percentages) on each individual row, is there a way to replicate the same result in Excel, but in Pandas?
  2. Can you get totals using groupby similar to Pandas (margins=True)?

Any help would be greatly appreciated.

Upvotes: 0

Views: 282

Answers (1)

Andrew L
Andrew L

Reputation: 7038

The issue here as you've properly identified is you're inadvertently averaging the daily Click to Landing values, whereas Excel is computing an average by summing up the values of Clicks and Landing Page visits per day and dividing.

You can do the same thing in pandas by creating your pivot_table and summing the values per day (notice I've passed aggfunc='sum' to pivot_table). You can then apply division to find the average:

df
          Date  Reach  Impressions  Clicks  Landing_Page  Platform
0   30/05/2017  27447        27939      90            68  Facebook
1   30/05/2017  24299        24318      80            44  Facebook
2   30/05/2017   9897        10081      33            25  Facebook
3   30/05/2017  11696        11721      33            21  Facebook
4   30/05/2017     53           55       1             0  Facebook
5   31/05/2017  46632        68757     213           143  Facebook
6   31/05/2017  67478        73401     650           424  Facebook
7   31/05/2017  38831        47577     136            77  Facebook
8   31/05/2017  46834        52449     135            77  Facebook
9   31/05/2017    273          531      12            10  Facebook
10   1/06/2017  48307        72141     221           150  Facebook
11   1/06/2017  64122        79501     202           106  Facebook
12   1/06/2017  66810        71033     843           575  Facebook
13   1/06/2017  46225        50003     138            76  Facebook
14   1/06/2017    496         1043      16            15  Facebook

t = df.pivot_table(index="Date", columns="Platform", values=['Clicks', 'Landing_Page'], aggfunc='sum')
             Clicks Landing_Page
Platform   Facebook     Facebook
Date                            
1/06/2017      1420          922
30/05/2017      237          158
31/05/2017     1146          731

t[('Landing_Page', 'Facebook')].div(t[('Clicks', 'Facebook')]).apply(lambda x: '{:.2%}'.format(x))
Date
1/06/2017     64.90%
30/05/2017    66.70%
31/05/2017    63.80%
dtype: object

Upvotes: 1

Related Questions