Reputation: 53
I'm trying to find the largest subset sum of a particular data set, where the average of a field in the data set matches predetermined criteria.
For example, say I have a people's weights (example below) and my goal is to find the largest weight total where the average weight of the resulting group is between 200 and 201 pounds.
Using the above, the largest sum of weights where the average weight is between 200 and 201 pounds is from persons 1, 2, and 3. The sum of their weights is 601, and the average weight between them is 200.3.
Is there a way to program something to do the above, other than brute force, preferably using python? I'm not even sure where to start researching this so any help or guidance is appreciated.
Upvotes: 1
Views: 98
Reputation: 77885
Start by translating the desired range to 0, just for convenience. I'll translate to the lower bound, although the midpoint is also a good choice.
This makes your data set [10, 1, -10, 20, -12]
. The set sum is 9; you need it to be in the range 0
to upper_bound * len(data)
.
This gives you a tractable variation of the "target sum" problem: find a subset of the list that satisfies the sum constraint. In this case, you have two solutions: [10, 1, -10]
and [10, 1, -12]
. You can find this by enhancing the customary target-sum problems to include the changing sum: the "remaining amount" will include the change from the mean calculation.
Can you finish from there?
Upvotes: 1
Reputation: 16162
There are many ways to do this, but Pandas is your friend.
import pandas as pd
df = pd.DataFrame({'weight':[209, 203, 190, 220, 188, 193]})
df = df.rolling(3).mean()
df.query('200 <= weight <= 201').max()
In this case we create a dataframe from our weights. We then take a rolling average of every 3 weights. From this we get the max average between 200 and 201 lbs.
output:
weight 200.666667
dtype: float64
Upvotes: 0