Calculating Percent of Total using groupby

Question

I am having trouble trying to find a simple way to get the market share of products out of the total market. As an example, my dataframe is like the below:

For example, I have a dataframe like this below. Let's say product A, B and C belong to a market called 1, and D, E, F belong to markets 2, 3, 4 respectively. What I want to find is for each unique quarter (i.e. 1/6/2020 is quarter 2 of 2020), what is the market share of A, B and C out of the total market. For example, if we want the market share of A, B and C (USA market) out of quarter 2 of 2020, then we need to take 100+200+300 divide by 100+200+300+400+500+600, which gives 600/2100 = 28.57%

I am not sure what is the right way to approach it, so far I have to turn the whole dataframe into a 2d list and try to use for loops. I hope there is a neater and cleaner way to solve this.

  Product   Date       Value   
0   A        1/6/2020   100
1   B        1/6/2020   200
2   C        1/6/2020   300
3   D        1/6/2020   400
4   E        1/6/2020   500
5   F        1/6/2020   600
6   A        1/9/2020   600
7   B        1/9/2020   500
8   C        1/9/2020   400
9   D        1/9/2020   300
10  E        1/9/2020   200
11  F        1/9/2020   100

Alex Watt · Accepted Answer

You're on the right track to consider groupby!

Your dataframe needs to have the dimensions you mentioned, though -- the market, and the quarter. In addition, you probably want your Date column to be a datetime64.

Here is a code block that constructs a dataframe similar to what you have currently:

import pandas as pd

df = pd.DataFrame()
df["Product"] = ["A", "B", "C", "D", "E", "F"] * 2
df["Date"] = ["1/6/2020"] * 6 + ["1/9/2020"] * 6
df["Date"] = df["Date"].astype("datetime64[ns]")
df["Value"] = [100, 200, 300, 400, 500, 600] * 2

You might want to add a "Market" column, perhaps by defining a mapping from a product to a market, and adding it to your dataframe. Similarly, you could compute the quarter for each entry (although in your example, you seem to be saying that you want to treat the date object as the quarter).

products_to_markets = {
    "A": "USA", "B": "USA", "C": "USA",
    "D": "Canada", "E": "Canada", "F": "Canada"
}
df["Market"] = df["Product"].map(products_to_markets)
df["Quarter"] = df["Date"].dt.to_period("Q")

Now you can begin to perform some of the other calculations you're interested in. For instance, you can see the total value per market per quarter:

df.groupby(["Quarter", "Market"]).sum()

I think what you're looking for is something like this:

value_per_quarter = df.groupby("Quarter").sum()
df.groupby(["Quarter", "Market"]).sum() / value_per_quarter

Which yields:

                   Value
Quarter Market
2020Q1  Canada  0.714286
        USA     0.285714

Calculating Percent of Total using groupby

Answers (2)

Related Questions