Reputation: 29
I have two different dataframes, one containing the Net Revenue by SKU and Supplier and another one containing the stock of SKUs in each store. I need to get an average by Supplier of the stores that contains the SKUs that compouse up to 90% the net revenue of the supplier. It's a bit complicated but I will exemplify, and I hope it can make it clear. Please, note that if 3 SKUs compose 89% of the revenue, we need to consider another one.
Example:
Dataframe 1 - Net Revenue
Supplier | SKU | Net Revenue |
---|---|---|
UNILEVER | 1111 | 10000 |
UNILEVER | 2222 | 50000 |
UNILEVER | 3333 | 500 |
PEPSICO | 1313 | 680 |
PEPSICO | 2424 | 10000 |
PEPSICO | 2323 | 450 |
Dataframe 2 - Stock
Store | SKU | Stock |
---|---|---|
1 | 1111 | 1 |
1 | 2222 | 2 |
1 | 3333 | 1 |
2 | 1111 | 1 |
2 | 2222 | 0 |
2 | 3333 | 1 |
In this case, for UNILEVER, we need to discard SKU 3333 because its net revenue is not relevant (as 1111 and 2222 already compouse more than 90% of the total net revenue of UNILEVER). Coverage in this case will be 1.5 (we have 1111 in 2 stores and 2222 in one store: (1+2)/2).
Result is something like this:
Supplier | Coverage |
---|---|
UNILEVER | 1.5 |
PEPSICO | ... |
Please, note that the real dataset has a different number of SKUs by supplier and a huge number of suppliers (around 150), so performance doesn't need to be PRIORITY but it has to be considered.
Thanks in advance, guys.
Upvotes: 1
Views: 186
Reputation: 136
Calculate the cumulative sum grouping by Suppler and divide by the Supplier Total Revenue.
Then find each Supplier Revenue Threshold by getting the minimum Cumulative Revenue Percentage under 90%.
Then you can get the list of SKUs by Supplier and calculate the coverage.
import pandas as pd
df = pd.DataFrame([
['UNILEVER', '1111', 10000],
['UNILEVER', '2222', 50000],
['UNILEVER', '3333', 500],
['PEPSICO', '1313', 680],
['PEPSICO', '2424', 10000],
['PEPSICO', '2323', 450],
], columns=['Supplier', 'SKU', 'Net Revenue'])
total_revenue_by_supplier = df.groupby(df['Supplier']).sum().reset_index()
total_revenue_by_supplier.columns = ['Supplier', 'Total Revenue']
df = df.sort_values(['Supplier', 'Net Revenue'], ascending=[True, False])
df['cumsum'] = df.groupby(df['Supplier'])['Net Revenue'].transform(pd.Series.cumsum)
df = df.merge(total_revenue_by_supplier, on='Supplier')
df['cumpercentage'] = df['cumsum'] / df['Total Revenue']
min_before_threshold = df[df['cumpercentage'] >= 0.9][['Supplier', 'cumpercentage']].groupby('Supplier').min().reset_index()
min_before_threshold.columns = ['Supplier', 'Revenue Threshold']
df = df.merge(min_before_threshold, on='Supplier')
df = df[df['cumpercentage'] <= df['Revenue Threshold']][['Supplier', 'SKU', 'Net Revenue']]
df
Upvotes: 1