Grouping and sorting weighted values by rank

Question

I have data consisting of several values. Each value has a positive weight and a rank. I would like to return a sublist of these values such that values are added by increasing rank to the list until the sum of their weights is the smallest possible greater or equal to a given target, with the additional constraint that if a value is added, all of those with that same rank must be added, ie values are added to the list by ranking group.

I have written the alogorithm below, which is quite slow and not so easy to read. The input is a list of tuples (we can assume it is already sorted by rank); each tuple contains rank, value and weight. All that needs to be returned is the list of kept values, and their total weight. The algorithm groups the data by rank, then figures out how many ranks to add to reach the proper cumulative weight. Any suggestions to accelerate and improve?

For the input list [(1, 'a', 0.10131488885239891), (1, 'b', 0.05189220940446056), (1, 'c', 0.05233086643211586), (4, 'd', 0.019651238401109394), (5, 'e', 0.11439063810128784), (6, 'f', 0.003242047085661822), (6, 'g', 0.03093925229983648), (8, 'h', 0.09737202015393008), (8, 'i', 0.02024957487257707), (10, 'j', 0.08454098571100034), (11, 'k', 0.18524611312188527), (11, 'l', 0.07110968996322396), (11, 'm', 0.023236464232769254), (14, 'n', 0.14448401136774303)] and target weight 0.47, it returns the sublist ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'] and kept weight 0.49138273560337803.

weightGoal = 0.47 # Target weight
    
# Example of data; each tuple has (rank, value and weight)
individualData = [(1, 'a', 0.10131488885239891), (1, 'b', 0.05189220940446056), (1, 'c', 0.05233086643211586), (4, 'd', 0.019651238401109394), (5, 'e', 0.11439063810128784), (6, 'f', 0.003242047085661822), (6, 'g', 0.03093925229983648), (8, 'h', 0.09737202015393008), (8, 'i', 0.02024957487257707), (10, 'j', 0.08454098571100034), (11, 'k', 0.18524611312188527), (11, 'l', 0.07110968996322396), (11, 'm', 0.023236464232769254), (14, 'n', 0.14448401136774303)]
print("Input", individualData, weightGoal)

# Determine cumulative weight for each value.
vCumulWeight = [0]
for ind in individualData: vCumulWeight.append(vCumulWeight[-1] + ind[-1])
vCumulWeight.pop(0)

# To group by rank    
ranks = list(set([v[0] for v in individualData])) # Unique values of the rank
ranks.sort(reverse=False) # Sort the list by rank
rankWeights = [] # Total weight of values with same rank
rankValues = [] # Values with same rank

# Group points by rank
for r in ranks:
    rankW, rankV = 0, []
    for ind in individualData:
        if r == ind[0]:
            rankW += ind[-1]
            rankV.append(ind[1])
    rankWeights.append(rankW)
    rankValues.append(rankV)
    
# Get cumulative weight of each group
rankCumulWeight = [0]
for op in rankWeights: rankCumulWeight.append(rankCumulWeight[-1] + op)
rankCumulWeight.pop(0)

# Rank all values and all groups
fullRankingVals, fullRankingGroups, rnk, prevLen, prevCP = [], [], 0, 1, 0
for i, (o, pr, ov, cp) in enumerate(zip(ranks, rankWeights, rankValues, rankCumulWeight)):
    fullRankingGroups.append((i+1, ov, o, pr, cp, ((cp <= weightGoal) or (prevCP < weightGoal))))
    rnk += prevLen
    for v in ov: fullRankingVals.append((rnk, v, o, pr/len(ov), cp, ((cp <= weightGoal) or (prevCP < weightGoal))))
    prevLen = len(ov)
    prevCP = cp

print("============ Check by value")
for v in fullRankingVals: print(v)
print("============ Check by group")
for v in fullRankingGroups: print(v)
print("============")

## Actual acceptance
tmpAccepted = [v for v in fullRankingVals if (v[-1])]
acceptedValues = [v[1] for v in tmpAccepted]
accdeptedWeight = max(v[-2] for v in tmpAccepted)
print("Result", acceptedValues, accdeptedWeight)

Nk03 · Accepted Answer

With Pandas:

import pandas as pd

df = pd.DataFrame(l, columns=['rank', 'value', 'weight'])
df = df.groupby('rank').agg({'value': list, 'weight': sum})

m = ~((df.weight.cumsum() > 0.47).shift(
    fill_value=False))

result = df[m]['value'].explode().to_list()
weight = df[m]['weight'].cumsum().iloc[-1]

Grouping and sorting weighted values by rank

Answers (2)

Related Questions