Reputation: 33
I need to run an optimization for 100k to 500k variables, but it gives me max equation length reached an error. Can anyone help me out to set up this problem? Time is not a constraint as long as it takes 3-4 hours to run, it's fine.
df1 = df_opt.head(100000).copy()
#initialize model
m= GEKKO()
m.options.SOLVER=1
#initialize variable
x = np.array([m.Var(lb=0,ub=100,integer=True) for i in range(len(df1))])
#constraints
m.Equation(m.sum(x)<=30000)
#objective
responsiveness = np.array([m.Const(i) for i in df1['responsivness'].values])
affinity_score = np.array([m.Const(i) for i in df1['affinity'].values])
cost = np.array([m.Const(i) for i in df1['cost'].values])
expr = np.array([m.log(i) - k * j \
for i,j,k in zip((1+responsiveness * affinity_score * x),x,cost)])
m.Obj(-(m.sum(expr)))
#optimization
m.solve(disp=False)
Upvotes: 3
Views: 206
Reputation: 14346
When creating a question, it is important to have a Minimal Example that is complete. Here is a modification that creates a random DataFrame with n
rows.
from gekko import GEKKO
import numpy as np
import pandas as pd
n = 10
df1 = pd.DataFrame({'responsivness':np.random.rand(n),\
'affinity':np.random.rand(n),\
'cost':np.random.rand(n)})
print(df1.head())
#initialize model
m= GEKKO(remote=False)
m.options.SOLVER=1
#initialize variable
x = np.array([m.Var(lb=0,ub=100,integer=True) for i in range(len(df1))])
#constraints
m.Equation(m.sum(x)<=30000)
#objective
responsiveness = np.array([m.Const(i) for i in df1['responsivness'].values])
affinity_score = np.array([m.Const(i) for i in df1['affinity'].values])
cost = np.array([m.Const(i) for i in df1['cost'].values])
expr = np.array([m.log(i) - k * j \
for i,j,k in zip((1+responsiveness * affinity_score * x),x,cost)])
m.Obj(-(m.sum(expr)))
#optimization
m.solve(disp=True)
This solves successfully for n=10
with the random numbers selected.
--------- APM Model Size ------------
Each time step contains
Objects : 0
Constants : 30
Variables : 11
Intermediates: 0
Connections : 0
Equations : 2
Residuals : 2
Number of state variables: 11
Number of total equations: - 1
Number of slack variables: - 1
---------------------------------------
Degrees of freedom : 9
----------------------------------------------
Steady State Optimization with APOPT Solver
----------------------------------------------
Iter: 1 I: 0 Tm: 0.00 NLPi: 20 Dpth: 0 Lvs: 3 Obj: -1.35E+00 Gap: NaN
--Integer Solution: -1.34E+00 Lowest Leaf: -1.35E+00 Gap: 4.73E-03
Iter: 2 I: 0 Tm: 0.00 NLPi: 2 Dpth: 1 Lvs: 3 Obj: -1.34E+00 Gap: 4.73E-03
Successful solution
---------------------------------------------------
Solver : APOPT (v1.0)
Solution time : 1.519999999436550E-002 sec
Objective : -1.34078995171088
Successful solution
---------------------------------------------------
The underlying model gk_model0.apm
can be accessed by navigating to m.path
or by using m.open_folder()
.
Model
Constants
i0 = 0.14255660947333681
i1 = 0.9112789578520111
i2 = 0.10526966142004568
i3 = 0.6255161023214897
i4 = 0.2434604974789274
i5 = 0.812768922376058
i6 = 0.555163868440599
i7 = 0.7286240480266872
i8 = 0.39643651685899695
i9 = 0.4664238475079081
i10 = 0.588654005219946
i11 = 0.7807594551372589
i12 = 0.623910408858981
i13 = 0.19421798736230456
i14 = 0.3061420839190525
i15 = 0.07764492888189267
i16 = 0.7276569154297892
i17 = 0.5630014016669598
i18 = 0.9633171115575193
i19 = 0.23310692223695684
i20 = 0.008089496373502647
i21 = 0.7533529530133879
i22 = 0.4218710975774087
i23 = 0.03329287687223692
i24 = 0.9136665338169284
i25 = 0.7528330460265494
i26 = 0.0810779357870034
i27 = 0.4183140612726107
i28 = 0.4381547602657835
i29 = 0.907339329732971
End Constants
Variables
int_v1 = 0, <= 100, >= 0
int_v2 = 0, <= 100, >= 0
int_v3 = 0, <= 100, >= 0
int_v4 = 0, <= 100, >= 0
int_v5 = 0, <= 100, >= 0
int_v6 = 0, <= 100, >= 0
int_v7 = 0, <= 100, >= 0
int_v8 = 0, <= 100, >= 0
int_v9 = 0, <= 100, >= 0
int_v10 = 0, <= 100, >= 0
End Variables
Equations
(((((((((int_v1+int_v2)+int_v3)+int_v4)+int_v5)+int_v6)+int_v7)+int_v8)+int_v9)+int_v10)<=30000
minimize (-((((((((((log((1+((((i0)*(i10)))*(int_v1))))-((i20)*(int_v1)))+(log((1+((((i1)*(i11)))*(int_v2))))-((i21)*(int_v2))))+(log((1+((((i2)*(i12)))*(int_v3))))-((i22)*(int_v3))))+(log((1+((((i3)*(i13)))*(int_v4))))-((i23)*(int_v4))))+(log((1+((((i4)*(i14)))*(int_v5))))-((i24)*(int_v5))))+(log((1+((((i5)*(i15)))*(int_v6))))-((i25)*(int_v6))))+(log((1+((((i6)*(i16)))*(int_v7))))-((i26)*(int_v7))))+(log((1+((((i7)*(i17)))*(int_v8))))-((i27)*(int_v8))))+(log((1+((((i8)*(i18)))*(int_v9))))-((i28)*(int_v9))))+(log((1+((((i9)*(i19)))*(int_v10))))-((i29)*(int_v10)))))
End Equations
End Model
You can avoid a large symbolic expression string by modifying the model as:
from gekko import GEKKO
import numpy as np
import pandas as pd
n = 5000
df1 = pd.DataFrame({'responsiveness':np.random.rand(n),\
'affinity':np.random.rand(n),\
'cost':np.random.rand(n)})
print(df1.head())
#initialize model
m= GEKKO(remote=False)
m.options.SOLVER=1
#initialize variable
x = np.array([m.Var(lb=0,ub=100,integer=True) for i in range(len(df1))])
#constraints
m.Equation(m.sum(list(x))<=30000)
#objective
responsiveness = df1['responsiveness'].values
affinity_score = df1['affinity'].values
cost = df1['cost'].values
[m.Maximize(m.log(i) - k * j) \
for i,j,k in zip((1+responsiveness * affinity_score * x),x,cost)]
#optimization
m.solve(disp=True)
m.open_folder()
This gives an underlying model of the following that does not increase in symbolic expression size with number of variables.
Model
Variables
int_v1 = 0, <= 100, >= 0
int_v2 = 0, <= 100, >= 0
int_v3 = 0, <= 100, >= 0
int_v4 = 0, <= 100, >= 0
int_v5 = 0, <= 100, >= 0
int_v6 = 0, <= 100, >= 0
int_v7 = 0, <= 100, >= 0
int_v8 = 0, <= 100, >= 0
int_v9 = 0, <= 100, >= 0
int_v10 = 0, <= 100, >= 0
v11 = 0
End Variables
Equations
v11<=30000
maximize (log((1+((0.16283879947305288)*(int_v1))))-((0.365323493448101)*(int_v1)))
maximize (log((1+((0.3509872155181691)*(int_v2))))-((0.12162206443479917)*(int_v2)))
maximize (log((1+((0.20134572143617518)*(int_v3))))-((0.47137701674279087)*(int_v3)))
maximize (log((1+((0.287818142242232)*(int_v4))))-((0.12042554857067544)*(int_v4)))
maximize (log((1+((0.48997709502894166)*(int_v5))))-((0.21084485862098745)*(int_v5)))
maximize (log((1+((0.6178277437136291)*(int_v6))))-((0.42602122419609056)*(int_v6)))
maximize (log((1+((0.13033555293152563)*(int_v7))))-((0.8796057438355324)*(int_v7)))
maximize (log((1+((0.5002025885707916)*(int_v8))))-((0.9703263879586648)*(int_v8)))
maximize (log((1+((0.7095523321888202)*(int_v9))))-((0.8498606490337451)*(int_v9)))
maximize (log((1+((0.6174815809937886)*(int_v10))))-((0.9390903075640681)*(int_v10)))
End Equations
Connections
int_v1 = sum_1.x[1]
int_v2 = sum_1.x[2]
int_v3 = sum_1.x[3]
int_v4 = sum_1.x[4]
int_v5 = sum_1.x[5]
int_v6 = sum_1.x[6]
int_v7 = sum_1.x[7]
int_v8 = sum_1.x[8]
int_v9 = sum_1.x[9]
int_v10 = sum_1.x[10]
v11 = sum_1.y
End Connections
Objects
sum_1 = sum(10)
End Objects
End Model
I fixed a bug in Gekko so you should be able to use m.Equation(m.sum(x)<=30000)
on the next release of Gekko instead of converting x
to a list. This modification now works for larger models that previously failed. I tested it with n=5000
.
Number of state variables: 5002
Number of total equations: - 2
Number of slack variables: - 1
---------------------------------------
Degrees of freedom : 4999
----------------------------------------------
Steady State Optimization with APOPT Solver
----------------------------------------------
Iter: 1 I: 0 Tm: 313.38 NLPi: 14 Dpth: 0 Lvs: 3 Obj: -6.05E+02 Gap: NaN
--Integer Solution: -6.01E+02 Lowest Leaf: -6.05E+02 Gap: 6.60E-03
Iter: 2 I: 0 Tm: 0.06 NLPi: 2 Dpth: 1 Lvs: 3 Obj: -6.01E+02 Gap: 6.60E-03
Successful solution
---------------------------------------------------
Solver : APOPT (v1.0)
Solution time : 313.461699999985 sec
Objective : -600.648283994940
Successful solution
---------------------------------------------------
The solution time increases to 313.46 sec. There is also more processing time to compile the model. You may want to start with smaller models and check how much it will increase the computational time. I also recommend that you use remote=False
to solve locally instead of on the remote server.
Integer optimization problems can take exponentially longer with more variables so you'll want to ensure that you aren't starting a problem that will require 30 years to complete. A good way to check this is solve successively larger problems to get an idea of the scale-up.
Upvotes: 2