Reputation: 1023
I am having trouble trying to find a simple way to get the market share of products out of the total market. As an example, my dataframe is like the below:
For example, I have a dataframe like this below. Let's say product A, B and C belong to a market called 1, and D, E, F belong to markets 2, 3, 4 respectively. What I want to find is for each unique quarter (i.e. 1/6/2020 is quarter 2 of 2020), what is the market share of A, B and C out of the total market. For example, if we want the market share of A, B and C (USA market) out of quarter 2 of 2020, then we need to take 100+200+300 divide by 100+200+300+400+500+600, which gives 600/2100 = 28.57%
I am not sure what is the right way to approach it, so far I have to turn the whole dataframe into a 2d list and try to use for loops. I hope there is a neater and cleaner way to solve this.
Product Date Value
0 A 1/6/2020 100
1 B 1/6/2020 200
2 C 1/6/2020 300
3 D 1/6/2020 400
4 E 1/6/2020 500
5 F 1/6/2020 600
6 A 1/9/2020 600
7 B 1/9/2020 500
8 C 1/9/2020 400
9 D 1/9/2020 300
10 E 1/9/2020 200
11 F 1/9/2020 100
Upvotes: 1
Views: 125
Reputation: 937
You're on the right track to consider groupby
!
Your dataframe needs to have the dimensions you mentioned, though -- the market, and the quarter. In addition, you probably want your Date column to be a datetime64.
Here is a code block that constructs a dataframe similar to what you have currently:
import pandas as pd
df = pd.DataFrame()
df["Product"] = ["A", "B", "C", "D", "E", "F"] * 2
df["Date"] = ["1/6/2020"] * 6 + ["1/9/2020"] * 6
df["Date"] = df["Date"].astype("datetime64[ns]")
df["Value"] = [100, 200, 300, 400, 500, 600] * 2
You might want to add a "Market" column, perhaps by defining a mapping from a product to a market, and adding it to your dataframe. Similarly, you could compute the quarter for each entry (although in your example, you seem to be saying that you want to treat the date object as the quarter).
products_to_markets = {
"A": "USA", "B": "USA", "C": "USA",
"D": "Canada", "E": "Canada", "F": "Canada"
}
df["Market"] = df["Product"].map(products_to_markets)
df["Quarter"] = df["Date"].dt.to_period("Q")
Now you can begin to perform some of the other calculations you're interested in. For instance, you can see the total value per market per quarter:
df.groupby(["Quarter", "Market"]).sum()
I think what you're looking for is something like this:
value_per_quarter = df.groupby("Quarter").sum()
df.groupby(["Quarter", "Market"]).sum() / value_per_quarter
Which yields:
Value
Quarter Market
2020Q1 Canada 0.714286
USA 0.285714
Upvotes: 1
Reputation: 12838
First you have to create a dataframe which maps your products to your markets.
Then use pd.crosstab()
to get a nice pivot table with argument normalize=index
giving you the percentages per row.
import pandas as pd
from io import StringIO
text = """
Product Date Value
0 A 1/6/2020 100
1 B 1/6/2020 200
2 C 1/6/2020 300
3 D 1/6/2020 400
4 E 1/6/2020 500
5 F 1/6/2020 600
6 A 1/9/2020 600
7 B 1/9/2020 500
8 C 1/9/2020 400
9 D 1/9/2020 300
10 E 1/9/2020 200
11 F 1/9/2020 100
"""
# create sample dataframe
df = pd.read_csv(StringIO(text), header=0, sep='\s+')
# create translation of products to markets
market_df = pd.DataFrame([
['A', 1], ['B', 1], ['C', 1],
['D', 2], ['E', 3], ['F', 4]],
columns=['Product', 'Market'],
)
# merge to get products mapped to markets
merged_df = pd.merge(
df,
market_df,
how='left',
on='Product',
)
# crosstab calculates totals per market and date
# normalize='index' calculates percentages over rows
pd.crosstab(
merged_df['Date'],
merged_df['Market'],
merged_df['Value'],
aggfunc='sum',
normalize='index',
)
Resulting dataframe:
Market
Date 1 2 3 4
1/6/2020 0.285714 0.190476 0.238095 0.285714
1/9/2020 0.714286 0.142857 0.095238 0.047619
Upvotes: 1