Reputation: 5314
For a computation engineering model, I want to do a grid search for all feasible parameter combinations. Each parameter has a certain possibility range, e.g. (0 … 100) and the parameter combination must fulfil the condition a+b+c=100
. An example:
ranges = {
'a': (95, 99),
'b': (1, 4),
'c': (1, 2)}
increment = 1.0
target = 100.0
So the combinations that fulfil the condition a+b+c=100
are:
[(95, 4, 1), (95, 3, 2), (96, 2, 2), (96, 3, 1), (97, 1, 2), (97, 2, 1), (98, 1, 1)]
This algorithm should run with any number of parameters, range lengths, and increments.
The solutions I have come up with are all brute-forcing the problem. That means calculating all combinations and then discarding the ones that do not fulfil the given condition:
def solution1(ranges, increment, target):
combinations = []
for parameter in ranges:
combinations.append(list(np.arange(ranges[parameter][0], ranges[parameter][1], increment)))
# np.arange() is exclusive of the upper bound, let's fix that
if combinations[-1][-1] != ranges[parameter][1]:
combinations[-1].append(ranges[parameter][1])
combinations = list(itertools.product(*combinations))
df = pd.DataFrame(combinations, columns=ranges.keys())
# using np.isclose() so that the algorithm works for floats
return df[np.isclose(df.sum(axis=1), target)]
Since I ran into RAM problems with solution1()
, I used itertools.product
as an iterator.
def solution2(ranges, increment, target):
combinations = []
for parameter in ranges:
combinations.append(list(np.arange(ranges[parameter][0], ranges[parameter][1], increment)))
# np.arange() is exclusive of the upper bound, let's fix that
if combinations[-1][-1] != ranges[parameter][1]:
combinations[-1].append(ranges[parameter][1])
result = []
for combination in itertools.product(*combinations):
# using np.isclose() so that the algorithm works for floats
if np.isclose(sum(combination), target):
result.append(combination)
df = pd.DataFrame(result, columns=ranges.keys())
return df
However, this quickly takes a few days to compute. Hence, both solutions are not viable for large number of parameters and ranges. For instance, one set that I am trying to solve is (already unpacked combinations
variable):
[[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0], [22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0, 48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], [0.0, 1.0, 2.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0], [0.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0], [0.0]]
This results in memory use of >40 GB for solution1()
and calculation time >400 hours for solution2()
.
Do you see a solution that is either faster or more intelligent, i.e. not trying to brute-force the problem?
P.S.: I am not 100% sure if this question would be a better fit on one of the other Stackexchange sites. Please suggest in the comments if you think it should be moved and I will delete it here.
Upvotes: 0
Views: 969
Reputation: 483
Here is a recursive solution:
a = [95, 100]
b = [1, 4]
c = [1, 2]
Params = (a, b, c)
def GetValidParamValues(Params, constriantSum, prevVals):
validParamValues = []
if (len(Params) == 1):
if (constriantSum >= Params[0][0] and constriantSum <= Params[0][1]):
validParamValues.append(constriantSum)
for v in validParamValues:
print(prevVals + v)
return
sumOfLowParams = sum([Params[x][0] for x in range(1, len(Params))])
sumOfHighParams = sum([Params[x][1] for x in range(1, len(Params))])
lowEnd = max(Params[0][0], constriantSum - sumOfHighParams)
highEnd = min(Params[0][1], constriantSum - sumOfLowParams) + 1
if (len(Params) == 2):
for av in range(lowEnd, highEnd):
bv = constriantSum - av
if (bv <= Params[1][1]):
validParamValues.append([av, bv])
for v in validParamValues:
print(prevVals + v)
return
for av in range(lowEnd, highEnd):
nexPrevVals = prevVals + [av]
subSeParams = Params[1:]
GetValidParamValues(subSeParams, constriantSum - av, nexPrevVals)
GetValidParamValues(Params, 100)
The idea is that if there were 2 parameters, a
and b
, we could list all the valid pairs by passing through the values of a
, and taking (ai, S - ai)
and just checking if S-ai
is a valid value for b
.
This is improved on since we can calculate ahead of time which values of ai
will make S-ai
a valid value for b
, so we never check values that don't work.
When the number of params is more than 2, we can again look at every valid value of ai
, and we know the sum of the other numbers must be S - ai
. So the only thing we need is every possible way for the other numbers to add to S - ai
, which is the same problem with one fewer parameter. So by using recursion we can get it go all the way down to size 2 and solve it.
Upvotes: 1