Lucas Werner
Lucas Werner

Reputation: 29

How can I get rows that compouse up to 90% of a sum?

I have two different dataframes, one containing the Net Revenue by SKU and Supplier and another one containing the stock of SKUs in each store. I need to get an average by Supplier of the stores that contains the SKUs that compouse up to 90% the net revenue of the supplier. It's a bit complicated but I will exemplify, and I hope it can make it clear. Please, note that if 3 SKUs compose 89% of the revenue, we need to consider another one.

Example:

Dataframe 1 - Net Revenue

Supplier SKU Net Revenue
UNILEVER 1111 10000
UNILEVER 2222 50000
UNILEVER 3333 500
PEPSICO 1313 680
PEPSICO 2424 10000
PEPSICO 2323 450

Dataframe 2 - Stock

Store SKU Stock
1 1111 1
1 2222 2
1 3333 1
2 1111 1
2 2222 0
2 3333 1

In this case, for UNILEVER, we need to discard SKU 3333 because its net revenue is not relevant (as 1111 and 2222 already compouse more than 90% of the total net revenue of UNILEVER). Coverage in this case will be 1.5 (we have 1111 in 2 stores and 2222 in one store: (1+2)/2).

Result is something like this:

Supplier Coverage
UNILEVER 1.5
PEPSICO ...

Please, note that the real dataset has a different number of SKUs by supplier and a huge number of suppliers (around 150), so performance doesn't need to be PRIORITY but it has to be considered.

Thanks in advance, guys.

Upvotes: 1

Views: 186

Answers (1)

Jose HLS
Jose HLS

Reputation: 136

Calculate the cumulative sum grouping by Suppler and divide by the Supplier Total Revenue.

Then find each Supplier Revenue Threshold by getting the minimum Cumulative Revenue Percentage under 90%.

Then you can get the list of SKUs by Supplier and calculate the coverage.

import pandas as pd

df = pd.DataFrame([
    ['UNILEVER', '1111', 10000], 
    ['UNILEVER', '2222', 50000], 
    ['UNILEVER', '3333', 500], 
    ['PEPSICO', '1313', 680], 
    ['PEPSICO', '2424', 10000], 
    ['PEPSICO', '2323', 450], 
], columns=['Supplier', 'SKU', 'Net Revenue'])

total_revenue_by_supplier = df.groupby(df['Supplier']).sum().reset_index()
total_revenue_by_supplier.columns = ['Supplier', 'Total Revenue']

df = df.sort_values(['Supplier', 'Net Revenue'], ascending=[True, False])

df['cumsum'] = df.groupby(df['Supplier'])['Net Revenue'].transform(pd.Series.cumsum)

df = df.merge(total_revenue_by_supplier, on='Supplier')

df['cumpercentage'] = df['cumsum'] / df['Total Revenue']

min_before_threshold = df[df['cumpercentage'] >= 0.9][['Supplier', 'cumpercentage']].groupby('Supplier').min().reset_index()
min_before_threshold.columns = ['Supplier', 'Revenue Threshold']

df = df.merge(min_before_threshold, on='Supplier')

df = df[df['cumpercentage'] <= df['Revenue Threshold']][['Supplier', 'SKU', 'Net Revenue']]

df

Upvotes: 1

Related Questions