Reputation: 21
I am currently trying to solve a non linear problem very similar to the one provided here: https://aws-ml-blog.s3.amazonaws.com/artifacts/prevent_churn_by_optimizing_incentives/preventing_customer_churn_by_optimizing_incentives.html. It has a non linear objective function and linear constraint. My problem size is about 1.5 Million variables (actual customers). Apologies in advance, as I am not able to provide a sample of the data as it contains personal information. However my data is quite similar to the example, except for the problem size. (I tried to linearize the objective but have not been successful yet).
I first used Gekko (with remote =False, IPOPT) as suggested in the article but despite running for several hours (10-12 hours) on AWS instance ml-m4-16xlarge - it does not return a solution. I did try a smaller size (about 100K variables) and the result was the same. I tried CyIPOPT but it errors out due to memory issues. To try a simpler example, I used an example recommended here (GEKKO Exception: @error: Max Equation Length (Number of variables greater than 100k)) and increased problem size to 100K variables but the kernel was unresponsive for Gekko. What steps could I attempt next to try and fix this issue?
Update based on first comment:
Thank you.
I first tried a small example of 500 variables (code attached below for my problem with sample data) and it ran successfully. However, I noticed that despite specifying IPOPT the model used APOPT (images below)
when I increased the size N to 5000 I got the Max Equation Length exception stated in the other thread. I then tried to modify the problem as you had stated there
But at the bottom of the solution the solver is stated as IPOPT:
from gekko import GEKKO
import numpy as np
import pandas as pd
alpha = np.random.uniform(low=0.01, high=0.99, size=(500,))
P= np.random.uniform(low=2, high=250, size=(500,))
N = 500
gamma = np.ones(N)
len(np.where(alpha > 0.80)[0])
indices_gamma_eq_zero = np.union1d(np.where(alpha > 0.80)[0], np.where(alpha < 0.40)[0])
gamma[indices_gamma_eq_zero] = 10
gamma
m = GEKKO(remote=False)
m.options.SOLVER = 3 #IPOPT Solver
m.options.IMODE = 3
C = 100
# variable array dimension
#create array
#x = m.Array(m.Var,N)
#for i in range(N):
# x[i].value = C / N
# x[i].lower = 0
# x[i].upper = 50
#initialize variable
x = np.array([m.Var(lb=0,ub=50) for i in range(N)])
#constraints
m.Equation(m.sum(list(x))<=C)
beta = [1 - m.exp(-gamma[i] * x[i]) for i in range(N)]
ival = [m.Intermediate(beta[i] * (alpha[i] * P[i] - x[i])) for i in range(N)]
m.Obj(-sum(ival))
# minimize expected cost /maximize expected profit
m.solve()
print(x)
This is the slightly revised code:
from gekko import GEKKO
import numpy as np
import pandas as pd
alpha = np.random.uniform(low=0.01, high=0.99, size=(5000,))
P= np.random.uniform(low=2, high=250, size=(5000,))
N = 5000
gamma = np.ones(N)
len(np.where(alpha > 0.80)[0])
indices_gamma_eq_zero = np.union1d(np.where(alpha > 0.80)[0], np.where(alpha < 0.40)[0])
gamma[indices_gamma_eq_zero] = 10
gamma
m = GEKKO(remote=False)
m.options.SOLVER = 3 #IPOPT Solver
m.options.IMODE = 3
C = 500
#initialize variable
x = np.array([m.Var(lb=0,ub=50) for i in range(N)])
#constraints
m.Equation(m.sum(list(x))<=C)
#objective
alpha=pd.DataFrame(alpha,columns=['alpha'])
P=pd.DataFrame(P,columns=['P'])
gamma=pd.DataFrame(gamma,columns=['gamma'])
dataset= pd.concat([alpha,P,gamma],axis=1)
P = dataset['P'].values
alpha = dataset['alpha'].values
gamma = dataset['gamma'].values
beta = [1 - m.exp(-gamma[i] * x[i]) for i in range(N)]
[m.Maximize(beta * (alpha * P - x))]
#optimization
m.solve(disp=True)
But now I get this error:
Upvotes: 2
Views: 412
Reputation: 14376
Try scaling up the problem size to get an idea of how long the optimization will take for 100K variables. Here is the simple example with the scale-up results for 100-2000 variables.
from gekko import GEKKO
import numpy as np
import pandas as pd
import time
import matplotlib.pyplot as plt
s = np.arange(100,2010,100)
total = []; solve = []
for n in s:
print(n)
start = time.time()
df1 = pd.DataFrame({'responsiveness':np.random.rand(n),\
'affinity':np.random.rand(n),\
'cost':np.random.rand(n)})
m= GEKKO(remote=False)
m.options.SOLVER=1
x = np.array([m.Var(lb=0,ub=100,integer=True) for i in range(len(df1))])
m.Equation(m.sum(list(x))<=30000)
responsiveness = df1['responsiveness'].values
affinity_score = df1['affinity'].values
cost = df1['cost'].values
[m.Maximize(m.log(i) - k * j) \
for i,j,k in zip((1+responsiveness * affinity_score * x),x,cost)]
m.solve(disp=False)
total.append(time.time()-start)
solve.append(m.options.SOLVETIME)
#plt.plot(s,total,label='total time')
plt.plot(s,solve,label='solve time')
plt.xlabel('Size (n)'); plt.ylabel('Time (sec)')
plt.grid(); plt.legend(); plt.show()
Every method has a different scale-up profile as shown by this study on linear regression with 6 different Python packages.
There are specific things that can be done to most models to improve the speed. More specific recommendations can be given with a simplified version of the model (fake data is okay too).
Response to Edit
APOPT
and BPOPT
solvers. ARM Linux local solver is currently only BPOPT
. Windows has a local solver option for APOPT
, BPOPT
, and IPOPT
. All solvers are available when using m=GEKKO(remote=True)
to solve in the cloud. Gekko automatically switches to an available solver.[m.Maximize(beta[i] * (alpha * P - x[i])) for i in range(N)]
You can inspect the model file by opening the run directory with m.open_folder()
and the file gk0_model.apm
with a text editor. The objective function was likely written as a list. Each element of the objective function needs to be in a separate m.Maximize()
function or else use m.sum()
to create a summation and one objective function.
If you have further questions on this problem, I recommend that you create a new question and reference this one.
Upvotes: 1