ZackK
ZackK

Reputation: 446

Way to solve constraint satisfaction faster than brute force?

I have a CSV that provides a y value for three different x values for each row. When read into a pandas DataFrame, it looks like this:

     5     10    20
0 -13.6 -10.7 -10.3
1 -14.1 -11.2 -10.8
2 -12.3  -9.4  -9.0

That is, for row 0, at 5 the value is -13.6, at 10 the value is -10.7, and at 20 the value is -10.3. These values are the result of an algorithm in the form:

def calc(x, r, b, c, d):
    if x < 10:
        y = (x * r + b) / x
    elif x >= 10 and x < 20:
        y = ((x * r) + (b - c)) / x
    else:
        y = ((x * r) + (b - d)) / x
    return y

I want to find the value of r, b, c, and d for each row. I know certain things about each of the values. For example, for each row: r is in np.arange(-.05, -.11, -.01), b is in np.arange(0, -20.05, -.05), and c and d are in np.arange(0, 85, 5). I also know that d is <= c.

Currently, I am solving this with brute force. For each row, I iterate through every combination of r, b, c, and d and test if the value at the three x values is equal to the known value from the DataFrame. This works, giving me a few combinations for each row that are basically the same except for rounding differences.

The problem is that this approach takes a long time when I need to run it against 2,000+ rows. My question is: is there a faster way than iterating and testing every combination? My understanding is that this is a constraint satisfaction problem but, after that, I have no idea what to narrow in on; there are so many types of constraint satisfaction problems (it seems) that I'm still lost (I'm not even certain that this is such a problem!). Any help in pointing me in the right direction would be greatly appreciated.

Upvotes: 1

Views: 825

Answers (1)

sascha
sascha

Reputation: 33522

I hope i understood the task correctly.

If you know the resolution/discretization of the parameters, it looks like a discrete-optimization problem (in general: hard), which could be solved by CP-approaches.

But if you allow these values to be continuous (and reformulate the formulas), it is:

  • (1) a Linear Program: if checking for feasible values (there needs to be a valid solution)
  • (2) a Linear Program: if optimizing parameters for minimization of sum of absolute differences (=errors)
  • (3) a Quadratic Program: if optimizing parameters for minimization of sum of squared differences (=errors) / equivalent to minimizing euclidean-norm

All three versions can be solved efficiently!

Here is a non-general (could be easily generalized) implementation of (3) using cvxpy to formulate the problem and ecos to solve the QP. Both tools are open-source.

Code

import numpy as np
import time
from cvxpy import *
from random import uniform

""" GENERATE TEST DATA """
def sample_params():
    while True:
        r = uniform(-0.11, -0.05)
        b = uniform(-20.05, 0)
        c = uniform(0, 85)
        d = uniform(0, 85)
        if d <= c:
            return r, b, c, d

def calc(x, r, b, c, d):
    if x < 10:
        y = (x * r + b) / x
    elif x >= 10 and x < 20:
        y = ((x * r) + (b - c)) / x
    else:
        y = ((x * r) + (b - d)) / x
    return y

N = 2000
sampled_params = [sample_params() for i in range(N)]
data_5 = np.array([calc(5, *sampled_params[i]) for i in range(N)])
data_10 = np.array([calc(10, *sampled_params[i]) for i in range(N)])
data_20 = np.array([calc(20, *sampled_params[i]) for i in range(N)])
data = np.empty((N, 3))
for i in range(N):
    data[i, :] = [data_5[i], data_10[i], data_20[i]]

""" SOLVER """
def solve(row):
    """ vars """
    R = Variable(1)
    B = Variable(1)
    C = Variable(1)
    D = Variable(1)
    E = Variable(3)

    """ constraints """
    constraints = []
    # bounds
    constraints.append(R >= -.11)
    constraints.append(R <= -.05)
    constraints.append(B >= -20.05)
    constraints.append(B <= 0.0)
    constraints.append(C >= 0.0)
    constraints.append(C <= 85.0)
    constraints.append(D >= 0.0)
    constraints.append(D <= 85.0)
    constraints.append(D <= C)
    # formula of model
    constraints.append((1.0 / 5.0) * B + R == row[0] + E[0])  # alternate function form: b/x+r
    constraints.append((1.0 / 10.0)  * B - (1.0 / 10.0) * C == row[1] + E[1])  # alternate function form: b/x-c/x+r
    constraints.append((1.0 / 20.0)  * B - (1.0 / 20.0) * D == row[2] + E[2])  # alternate function form: b/x-d/x+r

    """ Objective """
    objective = Minimize(norm(E, 2))

    """ Solve """
    problem = Problem(objective, constraints)
    problem.solve(solver=ECOS, verbose=False)
    return R.value, B.value, C.value, D.value, E.value

start = time.time()
for i in range(N):
    r, b, c, d, e = solve(data[i])
end = time.time()

print('seconds taken: ', end-start)
print('seconds per row: ', (end-start) / N)

Output

('seconds taken: ', 20.620506048202515)
('seconds per row: ', 0.010310253024101258)

Upvotes: 1

Related Questions