GEKKO Exception: @error: Max Equation Length (Number of variables greater than 100k)

Question

I need to run an optimization for 100k to 500k variables, but it gives me max equation length reached an error. Can anyone help me out to set up this problem? Time is not a constraint as long as it takes 3-4 hours to run, it's fine.

df1 = df_opt.head(100000).copy()

#initialize model
m= GEKKO()
m.options.SOLVER=1

#initialize variable
x =  np.array([m.Var(lb=0,ub=100,integer=True) for i in range(len(df1))])

#constraints
m.Equation(m.sum(x)<=30000)

#objective
responsiveness = np.array([m.Const(i) for i in df1['responsivness'].values])
affinity_score = np.array([m.Const(i) for i in df1['affinity'].values])
cost = np.array([m.Const(i) for i in df1['cost'].values])

expr = np.array([m.log(i) - k * j \
    for i,j,k in zip((1+responsiveness * affinity_score * x),x,cost)])

m.Obj(-(m.sum(expr)))

#optimization
m.solve(disp=False)

John Hedengren · Accepted Answer

When creating a question, it is important to have a Minimal Example that is complete. Here is a modification that creates a random DataFrame with n rows.

from gekko import GEKKO
import numpy as np
import pandas as pd

n = 10
df1 = pd.DataFrame({'responsivness':np.random.rand(n),\
                    'affinity':np.random.rand(n),\
                    'cost':np.random.rand(n)})
print(df1.head())

#initialize model
m= GEKKO(remote=False)
m.options.SOLVER=1

#initialize variable
x =  np.array([m.Var(lb=0,ub=100,integer=True) for i in range(len(df1))])

#constraints
m.Equation(m.sum(x)<=30000)

#objective
responsiveness = np.array([m.Const(i) for i in df1['responsivness'].values])
affinity_score = np.array([m.Const(i) for i in df1['affinity'].values])
cost = np.array([m.Const(i) for i in df1['cost'].values])

expr = np.array([m.log(i) - k * j \
    for i,j,k in zip((1+responsiveness * affinity_score * x),x,cost)])

m.Obj(-(m.sum(expr)))

#optimization
m.solve(disp=True)

This solves successfully for n=10 with the random numbers selected.

 --------- APM Model Size ------------
 Each time step contains
   Objects      :            0
   Constants    :           30
   Variables    :           11
   Intermediates:            0
   Connections  :            0
   Equations    :            2
   Residuals    :            2
 
 Number of state variables:             11
 Number of total equations: -            1
 Number of slack variables: -            1
 ---------------------------------------
 Degrees of freedom       :              9
 
 ----------------------------------------------
 Steady State Optimization with APOPT Solver
 ----------------------------------------------
Iter:     1 I:  0 Tm:      0.00 NLPi:   20 Dpth:    0 Lvs:    3 Obj: -1.35E+00 Gap:       NaN
--Integer Solution:  -1.34E+00 Lowest Leaf:  -1.35E+00 Gap:   4.73E-03
Iter:     2 I:  0 Tm:      0.00 NLPi:    2 Dpth:    1 Lvs:    3 Obj: -1.34E+00 Gap:  4.73E-03
 Successful solution
 
 ---------------------------------------------------
 Solver         :  APOPT (v1.0)
 Solution time  :   1.519999999436550E-002 sec
 Objective      :   -1.34078995171088     
 Successful solution
 ---------------------------------------------------

The underlying model gk_model0.apm can be accessed by navigating to m.path or by using m.open_folder().

Model
Constants
    i0 = 0.14255660947333681
    i1 = 0.9112789578520111
    i2 = 0.10526966142004568
    i3 = 0.6255161023214897
    i4 = 0.2434604974789274
    i5 = 0.812768922376058
    i6 = 0.555163868440599
    i7 = 0.7286240480266872
    i8 = 0.39643651685899695
    i9 = 0.4664238475079081
    i10 = 0.588654005219946
    i11 = 0.7807594551372589
    i12 = 0.623910408858981
    i13 = 0.19421798736230456
    i14 = 0.3061420839190525
    i15 = 0.07764492888189267
    i16 = 0.7276569154297892
    i17 = 0.5630014016669598
    i18 = 0.9633171115575193
    i19 = 0.23310692223695684
    i20 = 0.008089496373502647
    i21 = 0.7533529530133879
    i22 = 0.4218710975774087
    i23 = 0.03329287687223692
    i24 = 0.9136665338169284
    i25 = 0.7528330460265494
    i26 = 0.0810779357870034
    i27 = 0.4183140612726107
    i28 = 0.4381547602657835
    i29 = 0.907339329732971
End Constants
Variables
    int_v1 = 0, <= 100, >= 0
    int_v2 = 0, <= 100, >= 0
    int_v3 = 0, <= 100, >= 0
    int_v4 = 0, <= 100, >= 0
    int_v5 = 0, <= 100, >= 0
    int_v6 = 0, <= 100, >= 0
    int_v7 = 0, <= 100, >= 0
    int_v8 = 0, <= 100, >= 0
    int_v9 = 0, <= 100, >= 0
    int_v10 = 0, <= 100, >= 0
End Variables
Equations
    (((((((((int_v1+int_v2)+int_v3)+int_v4)+int_v5)+int_v6)+int_v7)+int_v8)+int_v9)+int_v10)<=30000
    minimize (-((((((((((log((1+((((i0)*(i10)))*(int_v1))))-((i20)*(int_v1)))+(log((1+((((i1)*(i11)))*(int_v2))))-((i21)*(int_v2))))+(log((1+((((i2)*(i12)))*(int_v3))))-((i22)*(int_v3))))+(log((1+((((i3)*(i13)))*(int_v4))))-((i23)*(int_v4))))+(log((1+((((i4)*(i14)))*(int_v5))))-((i24)*(int_v5))))+(log((1+((((i5)*(i15)))*(int_v6))))-((i25)*(int_v6))))+(log((1+((((i6)*(i16)))*(int_v7))))-((i26)*(int_v7))))+(log((1+((((i7)*(i17)))*(int_v8))))-((i27)*(int_v8))))+(log((1+((((i8)*(i18)))*(int_v9))))-((i28)*(int_v9))))+(log((1+((((i9)*(i19)))*(int_v10))))-((i29)*(int_v10)))))
End Equations

End Model

You can avoid a large symbolic expression string by modifying the model as:

from gekko import GEKKO
import numpy as np
import pandas as pd

n = 5000
df1 = pd.DataFrame({'responsiveness':np.random.rand(n),\
                    'affinity':np.random.rand(n),\
                    'cost':np.random.rand(n)})
print(df1.head())

#initialize model
m= GEKKO(remote=False)
m.options.SOLVER=1

#initialize variable
x =  np.array([m.Var(lb=0,ub=100,integer=True) for i in range(len(df1))])

#constraints
m.Equation(m.sum(list(x))<=30000)

#objective
responsiveness = df1['responsiveness'].values
affinity_score = df1['affinity'].values
cost           = df1['cost'].values
[m.Maximize(m.log(i) - k * j) \
    for i,j,k in zip((1+responsiveness * affinity_score * x),x,cost)]

#optimization
m.solve(disp=True)

m.open_folder()

This gives an underlying model of the following that does not increase in symbolic expression size with number of variables.

Model
Variables
    int_v1 = 0, <= 100, >= 0
    int_v2 = 0, <= 100, >= 0
    int_v3 = 0, <= 100, >= 0
    int_v4 = 0, <= 100, >= 0
    int_v5 = 0, <= 100, >= 0
    int_v6 = 0, <= 100, >= 0
    int_v7 = 0, <= 100, >= 0
    int_v8 = 0, <= 100, >= 0
    int_v9 = 0, <= 100, >= 0
    int_v10 = 0, <= 100, >= 0
    v11 = 0
End Variables
Equations
    v11<=30000
    maximize (log((1+((0.16283879947305288)*(int_v1))))-((0.365323493448101)*(int_v1)))
    maximize (log((1+((0.3509872155181691)*(int_v2))))-((0.12162206443479917)*(int_v2)))
    maximize (log((1+((0.20134572143617518)*(int_v3))))-((0.47137701674279087)*(int_v3)))
    maximize (log((1+((0.287818142242232)*(int_v4))))-((0.12042554857067544)*(int_v4)))
    maximize (log((1+((0.48997709502894166)*(int_v5))))-((0.21084485862098745)*(int_v5)))
    maximize (log((1+((0.6178277437136291)*(int_v6))))-((0.42602122419609056)*(int_v6)))
    maximize (log((1+((0.13033555293152563)*(int_v7))))-((0.8796057438355324)*(int_v7)))
    maximize (log((1+((0.5002025885707916)*(int_v8))))-((0.9703263879586648)*(int_v8)))
    maximize (log((1+((0.7095523321888202)*(int_v9))))-((0.8498606490337451)*(int_v9)))
    maximize (log((1+((0.6174815809937886)*(int_v10))))-((0.9390903075640681)*(int_v10)))
End Equations
Connections
    int_v1 = sum_1.x[1]
    int_v2 = sum_1.x[2]
    int_v3 = sum_1.x[3]
    int_v4 = sum_1.x[4]
    int_v5 = sum_1.x[5]
    int_v6 = sum_1.x[6]
    int_v7 = sum_1.x[7]
    int_v8 = sum_1.x[8]
    int_v9 = sum_1.x[9]
    int_v10 = sum_1.x[10]
    v11 = sum_1.y
End Connections
Objects
    sum_1 = sum(10)
End Objects

End Model

I fixed a bug in Gekko so you should be able to use m.Equation(m.sum(x)<=30000) on the next release of Gekko instead of converting x to a list. This modification now works for larger models that previously failed. I tested it with n=5000.

 Number of state variables:           5002
 Number of total equations: -            2
 Number of slack variables: -            1
 ---------------------------------------
 Degrees of freedom       :           4999
 
 ----------------------------------------------
 Steady State Optimization with APOPT Solver
 ----------------------------------------------
Iter:     1 I:  0 Tm:    313.38 NLPi:   14 Dpth:    0 Lvs:    3 Obj: -6.05E+02 Gap:       NaN
--Integer Solution:  -6.01E+02 Lowest Leaf:  -6.05E+02 Gap:   6.60E-03
Iter:     2 I:  0 Tm:      0.06 NLPi:    2 Dpth:    1 Lvs:    3 Obj: -6.01E+02 Gap:  6.60E-03
 Successful solution
 
 ---------------------------------------------------
 Solver         :  APOPT (v1.0)
 Solution time  :    313.461699999985      sec
 Objective      :   -600.648283994940     
 Successful solution
 ---------------------------------------------------

The solution time increases to 313.46 sec. There is also more processing time to compile the model. You may want to start with smaller models and check how much it will increase the computational time. I also recommend that you use remote=False to solve locally instead of on the remote server.

Integer optimization problems can take exponentially longer with more variables so you'll want to ensure that you aren't starting a problem that will require 30 years to complete. A good way to check this is solve successively larger problems to get an idea of the scale-up.

GEKKO Exception: @error: Max Equation Length (Number of variables greater than 100k)

Answers (1)

Related Questions