n1000
n1000

Reputation: 5314

How to efficiently do a grid search for parameter combinations in Python?

Problem

For a computation engineering model, I want to do a grid search for all feasible parameter combinations. Each parameter has a certain possibility range, e.g. (0 … 100) and the parameter combination must fulfil the condition a+b+c=100. An example:

ranges = {
    'a': (95, 99), 
    'b': (1, 4), 
    'c': (1, 2)}
increment = 1.0
target = 100.0

So the combinations that fulfil the condition a+b+c=100 are:

[(95, 4, 1), (95, 3, 2), (96, 2, 2), (96, 3, 1), (97, 1, 2), (97, 2, 1), (98, 1, 1)]  

This algorithm should run with any number of parameters, range lengths, and increments.

My solutions (so far)

The solutions I have come up with are all brute-forcing the problem. That means calculating all combinations and then discarding the ones that do not fulfil the given condition:

def solution1(ranges, increment, target):
    combinations = []
    for parameter in ranges:
        combinations.append(list(np.arange(ranges[parameter][0], ranges[parameter][1], increment)))
        # np.arange() is exclusive of the upper bound, let's fix that
        if combinations[-1][-1] != ranges[parameter][1]:
            combinations[-1].append(ranges[parameter][1])
    combinations = list(itertools.product(*combinations))
    df = pd.DataFrame(combinations, columns=ranges.keys())
    # using np.isclose() so that the algorithm works for floats
    return df[np.isclose(df.sum(axis=1), target)]

Since I ran into RAM problems with solution1(), I used itertools.product as an iterator.

def solution2(ranges, increment, target):
    combinations = []
    for parameter in ranges:
        combinations.append(list(np.arange(ranges[parameter][0], ranges[parameter][1], increment)))
        # np.arange() is exclusive of the upper bound, let's fix that
        if combinations[-1][-1] != ranges[parameter][1]:
            combinations[-1].append(ranges[parameter][1])
    result = []
    for combination in itertools.product(*combinations):
        # using np.isclose() so that the algorithm works for floats
        if np.isclose(sum(combination), target):
            result.append(combination)
    df = pd.DataFrame(result, columns=ranges.keys())
    return df

However, this quickly takes a few days to compute. Hence, both solutions are not viable for large number of parameters and ranges. For instance, one set that I am trying to solve is (already unpacked combinations variable):

[[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0], [22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0, 48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], [0.0, 1.0, 2.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0], [0.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0], [0.0]]

This results in memory use of >40 GB for solution1() and calculation time >400 hours for solution2().

Question

Do you see a solution that is either faster or more intelligent, i.e. not trying to brute-force the problem?

P.S.: I am not 100% sure if this question would be a better fit on one of the other Stackexchange sites. Please suggest in the comments if you think it should be moved and I will delete it here.

Upvotes: 0

Views: 969

Answers (1)

idontseethepoint
idontseethepoint

Reputation: 483

Here is a recursive solution:

a = [95, 100]
b = [1, 4]
c = [1, 2]

Params = (a, b, c)

def GetValidParamValues(Params, constriantSum, prevVals):
    validParamValues = []
    if (len(Params) == 1):
        if (constriantSum >= Params[0][0] and constriantSum <= Params[0][1]):
            validParamValues.append(constriantSum)
        for v in validParamValues:
            print(prevVals + v)
        return
    sumOfLowParams = sum([Params[x][0] for x in range(1, len(Params))])
    sumOfHighParams = sum([Params[x][1] for x in range(1, len(Params))])
    lowEnd = max(Params[0][0], constriantSum - sumOfHighParams)
    highEnd = min(Params[0][1], constriantSum - sumOfLowParams) + 1
    if (len(Params) == 2):
        for av in range(lowEnd, highEnd):
            bv  = constriantSum - av
            if (bv <= Params[1][1]):
                validParamValues.append([av, bv])
        for v in validParamValues:
            print(prevVals + v)
        return
    for av in range(lowEnd, highEnd):
        nexPrevVals = prevVals + [av]
        subSeParams = Params[1:]
        GetValidParamValues(subSeParams, constriantSum - av, nexPrevVals)


GetValidParamValues(Params, 100)

The idea is that if there were 2 parameters, a and b, we could list all the valid pairs by passing through the values of a, and taking (ai, S - ai) and just checking if S-ai is a valid value for b.

This is improved on since we can calculate ahead of time which values of ai will make S-ai a valid value for b, so we never check values that don't work.

When the number of params is more than 2, we can again look at every valid value of ai, and we know the sum of the other numbers must be S - ai. So the only thing we need is every possible way for the other numbers to add to S - ai, which is the same problem with one fewer parameter. So by using recursion we can get it go all the way down to size 2 and solve it.

Upvotes: 1

Related Questions