ilovewt
ilovewt

Reputation: 1023

Calculating Percent of Total using groupby

I am having trouble trying to find a simple way to get the market share of products out of the total market. As an example, my dataframe is like the below:

For example, I have a dataframe like this below. Let's say product A, B and C belong to a market called 1, and D, E, F belong to markets 2, 3, 4 respectively. What I want to find is for each unique quarter (i.e. 1/6/2020 is quarter 2 of 2020), what is the market share of A, B and C out of the total market. For example, if we want the market share of A, B and C (USA market) out of quarter 2 of 2020, then we need to take 100+200+300 divide by 100+200+300+400+500+600, which gives 600/2100 = 28.57%

I am not sure what is the right way to approach it, so far I have to turn the whole dataframe into a 2d list and try to use for loops. I hope there is a neater and cleaner way to solve this.

  Product   Date       Value   
0   A        1/6/2020   100
1   B        1/6/2020   200
2   C        1/6/2020   300
3   D        1/6/2020   400
4   E        1/6/2020   500
5   F        1/6/2020   600
6   A        1/9/2020   600
7   B        1/9/2020   500
8   C        1/9/2020   400
9   D        1/9/2020   300
10  E        1/9/2020   200
11  F        1/9/2020   100

Upvotes: 1

Views: 125

Answers (2)

Alex Watt
Alex Watt

Reputation: 937

You're on the right track to consider groupby!

Your dataframe needs to have the dimensions you mentioned, though -- the market, and the quarter. In addition, you probably want your Date column to be a datetime64.

Here is a code block that constructs a dataframe similar to what you have currently:

import pandas as pd

df = pd.DataFrame()
df["Product"] = ["A", "B", "C", "D", "E", "F"] * 2
df["Date"] = ["1/6/2020"] * 6 + ["1/9/2020"] * 6
df["Date"] = df["Date"].astype("datetime64[ns]")
df["Value"] = [100, 200, 300, 400, 500, 600] * 2

You might want to add a "Market" column, perhaps by defining a mapping from a product to a market, and adding it to your dataframe. Similarly, you could compute the quarter for each entry (although in your example, you seem to be saying that you want to treat the date object as the quarter).

products_to_markets = {
    "A": "USA", "B": "USA", "C": "USA",
    "D": "Canada", "E": "Canada", "F": "Canada"
}
df["Market"] = df["Product"].map(products_to_markets)
df["Quarter"] = df["Date"].dt.to_period("Q")

Now you can begin to perform some of the other calculations you're interested in. For instance, you can see the total value per market per quarter:

df.groupby(["Quarter", "Market"]).sum()

I think what you're looking for is something like this:

value_per_quarter = df.groupby("Quarter").sum()
df.groupby(["Quarter", "Market"]).sum() / value_per_quarter

Which yields:

                   Value
Quarter Market
2020Q1  Canada  0.714286
        USA     0.285714

Upvotes: 1

Sander van den Oord
Sander van den Oord

Reputation: 12838

First you have to create a dataframe which maps your products to your markets.

Then use pd.crosstab() to get a nice pivot table with argument normalize=index giving you the percentages per row.

import pandas as pd
from io import StringIO

text = """
  Product   Date       Value   
0   A        1/6/2020   100
1   B        1/6/2020   200
2   C        1/6/2020   300
3   D        1/6/2020   400
4   E        1/6/2020   500
5   F        1/6/2020   600
6   A        1/9/2020   600
7   B        1/9/2020   500
8   C        1/9/2020   400
9   D        1/9/2020   300
10  E        1/9/2020   200
11  F        1/9/2020   100
"""

# create sample dataframe
df = pd.read_csv(StringIO(text), header=0, sep='\s+')

# create translation of products to markets
market_df = pd.DataFrame([
    ['A', 1], ['B', 1], ['C', 1], 
    ['D', 2], ['E', 3], ['F', 4]], 
    columns=['Product', 'Market'],
)

# merge to get products mapped to markets
merged_df = pd.merge(
    df, 
    market_df, 
    how='left', 
    on='Product',
)

# crosstab calculates totals per market and date
# normalize='index' calculates percentages over rows
pd.crosstab(
    merged_df['Date'],
    merged_df['Market'], 
    merged_df['Value'], 
    aggfunc='sum', 
    normalize='index',
)

Resulting dataframe:

            Market  
Date        1           2           3           4           
1/6/2020    0.285714    0.190476    0.238095    0.285714
1/9/2020    0.714286    0.142857    0.095238    0.047619

Upvotes: 1

Related Questions