How can I get rows that compouse up to 90% of a sum?

Question

I have two different dataframes, one containing the Net Revenue by SKU and Supplier and another one containing the stock of SKUs in each store. I need to get an average by Supplier of the stores that contains the SKUs that compouse up to 90% the net revenue of the supplier. It's a bit complicated but I will exemplify, and I hope it can make it clear. Please, note that if 3 SKUs compose 89% of the revenue, we need to consider another one.

Example:

Dataframe 1 - Net Revenue

Supplier	SKU	Net Revenue
UNILEVER	1111	10000
UNILEVER	2222	50000
UNILEVER	3333	500
PEPSICO	1313	680
PEPSICO	2424	10000
PEPSICO	2323	450

Dataframe 2 - Stock

Store	SKU	Stock
1	1111	1
1	2222	2
1	3333	1
2	1111	1
2	2222	0
2	3333	1

In this case, for UNILEVER, we need to discard SKU 3333 because its net revenue is not relevant (as 1111 and 2222 already compouse more than 90% of the total net revenue of UNILEVER). Coverage in this case will be 1.5 (we have 1111 in 2 stores and 2222 in one store: (1+2)/2).

Result is something like this:

Supplier	Coverage
UNILEVER	1.5
PEPSICO	...

Please, note that the real dataset has a different number of SKUs by supplier and a huge number of suppliers (around 150), so performance doesn't need to be PRIORITY but it has to be considered.

Thanks in advance, guys.

Jose HLS · Accepted Answer

Calculate the cumulative sum grouping by Suppler and divide by the Supplier Total Revenue.

Then find each Supplier Revenue Threshold by getting the minimum Cumulative Revenue Percentage under 90%.

Then you can get the list of SKUs by Supplier and calculate the coverage.

import pandas as pd

df = pd.DataFrame([
    ['UNILEVER', '1111', 10000], 
    ['UNILEVER', '2222', 50000], 
    ['UNILEVER', '3333', 500], 
    ['PEPSICO', '1313', 680], 
    ['PEPSICO', '2424', 10000], 
    ['PEPSICO', '2323', 450], 
], columns=['Supplier', 'SKU', 'Net Revenue'])

total_revenue_by_supplier = df.groupby(df['Supplier']).sum().reset_index()
total_revenue_by_supplier.columns = ['Supplier', 'Total Revenue']

df = df.sort_values(['Supplier', 'Net Revenue'], ascending=[True, False])

df['cumsum'] = df.groupby(df['Supplier'])['Net Revenue'].transform(pd.Series.cumsum)

df = df.merge(total_revenue_by_supplier, on='Supplier')

df['cumpercentage'] = df['cumsum'] / df['Total Revenue']

min_before_threshold = df[df['cumpercentage'] >= 0.9][['Supplier', 'cumpercentage']].groupby('Supplier').min().reset_index()
min_before_threshold.columns = ['Supplier', 'Revenue Threshold']

df = df.merge(min_before_threshold, on='Supplier')

df = df[df['cumpercentage'] <= df['Revenue Threshold']][['Supplier', 'SKU', 'Net Revenue']]

df

How can I get rows that compouse up to 90% of a sum?

Answers (1)

Related Questions