Reputation: 294
I have data consisting of several values. Each value has a positive weight and a rank. I would like to return a sublist of these values such that values are added by increasing rank to the list until the sum of their weights is the smallest possible greater or equal to a given target, with the additional constraint that if a value is added, all of those with that same rank must be added, ie values are added to the list by ranking group.
I have written the alogorithm below, which is quite slow and not so easy to read. The input is a list of tuples (we can assume it is already sorted by rank); each tuple contains rank, value and weight. All that needs to be returned is the list of kept values, and their total weight. The algorithm groups the data by rank, then figures out how many ranks to add to reach the proper cumulative weight. Any suggestions to accelerate and improve?
For the input list [(1, 'a', 0.10131488885239891), (1, 'b', 0.05189220940446056), (1, 'c', 0.05233086643211586), (4, 'd', 0.019651238401109394), (5, 'e', 0.11439063810128784), (6, 'f', 0.003242047085661822), (6, 'g', 0.03093925229983648), (8, 'h', 0.09737202015393008), (8, 'i', 0.02024957487257707), (10, 'j', 0.08454098571100034), (11, 'k', 0.18524611312188527), (11, 'l', 0.07110968996322396), (11, 'm', 0.023236464232769254), (14, 'n', 0.14448401136774303)]
and target weight 0.47
, it returns the sublist ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
and kept weight 0.49138273560337803
.
weightGoal = 0.47 # Target weight
# Example of data; each tuple has (rank, value and weight)
individualData = [(1, 'a', 0.10131488885239891), (1, 'b', 0.05189220940446056), (1, 'c', 0.05233086643211586), (4, 'd', 0.019651238401109394), (5, 'e', 0.11439063810128784), (6, 'f', 0.003242047085661822), (6, 'g', 0.03093925229983648), (8, 'h', 0.09737202015393008), (8, 'i', 0.02024957487257707), (10, 'j', 0.08454098571100034), (11, 'k', 0.18524611312188527), (11, 'l', 0.07110968996322396), (11, 'm', 0.023236464232769254), (14, 'n', 0.14448401136774303)]
print("Input", individualData, weightGoal)
# Determine cumulative weight for each value.
vCumulWeight = [0]
for ind in individualData: vCumulWeight.append(vCumulWeight[-1] + ind[-1])
vCumulWeight.pop(0)
# To group by rank
ranks = list(set([v[0] for v in individualData])) # Unique values of the rank
ranks.sort(reverse=False) # Sort the list by rank
rankWeights = [] # Total weight of values with same rank
rankValues = [] # Values with same rank
# Group points by rank
for r in ranks:
rankW, rankV = 0, []
for ind in individualData:
if r == ind[0]:
rankW += ind[-1]
rankV.append(ind[1])
rankWeights.append(rankW)
rankValues.append(rankV)
# Get cumulative weight of each group
rankCumulWeight = [0]
for op in rankWeights: rankCumulWeight.append(rankCumulWeight[-1] + op)
rankCumulWeight.pop(0)
# Rank all values and all groups
fullRankingVals, fullRankingGroups, rnk, prevLen, prevCP = [], [], 0, 1, 0
for i, (o, pr, ov, cp) in enumerate(zip(ranks, rankWeights, rankValues, rankCumulWeight)):
fullRankingGroups.append((i+1, ov, o, pr, cp, ((cp <= weightGoal) or (prevCP < weightGoal))))
rnk += prevLen
for v in ov: fullRankingVals.append((rnk, v, o, pr/len(ov), cp, ((cp <= weightGoal) or (prevCP < weightGoal))))
prevLen = len(ov)
prevCP = cp
print("============ Check by value")
for v in fullRankingVals: print(v)
print("============ Check by group")
for v in fullRankingGroups: print(v)
print("============")
## Actual acceptance
tmpAccepted = [v for v in fullRankingVals if (v[-1])]
acceptedValues = [v[1] for v in tmpAccepted]
accdeptedWeight = max(v[-2] for v in tmpAccepted)
print("Result", acceptedValues, accdeptedWeight)
Upvotes: 1
Views: 481
Reputation: 14949
With Pandas
:
import pandas as pd
df = pd.DataFrame(l, columns=['rank', 'value', 'weight'])
df = df.groupby('rank').agg({'value': list, 'weight': sum})
m = ~((df.weight.cumsum() > 0.47).shift(
fill_value=False))
result = df[m]['value'].explode().to_list()
weight = df[m]['weight'].cumsum().iloc[-1]
Upvotes: 1
Reputation: 718
Might I suggest that you take the easiest approach possible, and just solve the problem in small chunks?
Let's break it down into 3 small steps so that you get to follow along what is actually happening:
rank
as a key.storage
until we reach the limit weightGoal
weightGoal
is reached, add to our storage
if the value
is the same as the last one we added to our storage
Now let's code the above flow:
weightGoal = 0.47 # Target weight
# Example of data; each tuple has (rank, value and weight)
individualData = [(1, 'a', 0.10131488885239891), (1, 'b', 0.05189220940446056), (1, 'c', 0.05233086643211586), (4, 'd', 0.019651238401109394), (5, 'e', 0.11439063810128784), (6, 'f', 0.003242047085661822), (6, 'g', 0.03093925229983648), (8, 'h', 0.09737202015393008), (8, 'i', 0.02024957487257707), (10, 'j', 0.08454098571100034), (11, 'k', 0.18524611312188527), (11, 'l', 0.07110968996322396), (11, 'm', 0.023236464232769254), (14, 'n', 0.14448401136774303)]
# step 1
individualData.sort(key=lambda x: x[0])
storage = 0
last = None
for data in individualData:
if storage < weightGoal: # step 2
last = data
storage += data[2]
elif last and data[1] == last[1]: # step 3
storage += data[2]
# no need to change `last` here unless you really want to
else: # termination clause
break
Now the final summation
is stored in the storage
variable and you also get the bonus of knowing which element was the last to add using the last
variable.
Let me know if there's any ambiguity.
Upvotes: 1