Reputation: 137
I'd want to achieve similar result as how the Solver-function in Excel is working. I've been reading of Scipy optimization and been trying to build a function which outputs what I would like to find the maximal value of. The equation is based on four different variables which, see my code below:
import pandas as pd
import numpy as np
from scipy import optimize
cols = {
'Dividend2': [9390, 7448, 177],
'Probability': [341, 376, 452],
'EV': [0.53, 0.60, 0.55],
'Dividend': [185, 55, 755],
'EV2': [123, 139, 544],
}
df = pd.DataFrame(cols)
def myFunc(params):
"""myFunc metric."""
(ev, bv, vc, dv) = params
df['Number'] = np.where(df['Dividend2'] <= vc, 1, 0) \
+ np.where(df['EV2'] <= dv, 1, 0)
df['Return'] = np.where(
df['EV'] <= ev, 0, np.where(
df['Probability'] >= bv, 0, df['Number'] * df['Dividend'] - (vc + dv)
)
)
return -1 * (df['Return'].sum())
b1 = [(0.2,4), (300,600), (0,1000), (0,1000)]
start = [0.2, 600, 1000, 1000]
result = optimize.minimize(fun=myFunc, bounds=b1, x0=start)
print(result)
So I'd like to find the maximum value of the column Return in df when changing the variables ev,bv,vc & dv. I'd like them to be between in the intervals of ev: 0.2-4, bv: 300-600, vc: 0-1000 & dv: 0-1000.
When running my code it seem like the function stops at x0.
Upvotes: 7
Views: 2091
Reputation: 7353
I will use optuna
library to give you a solution to the type of problem you are trying to solve. I have tried using scipy.optimize.minimize
and it appears that the loss-landscape is probably quite flat in most places, and hence the tolerances enforce the minimizing algorithm (L-BFGS-B
) to stop prematurely.
With optuna, it rather straight forward. Optuna only requires an objective
function and a study
. The study send various trials
to the objective
function, which in turn, evaluates the metric of your choice.
I have defined another metric function myFunc2
by mostly removing the np.where
calls, as you can do-away with them (reduces number of steps) and make the function slightly faster.
# install optuna with pip
pip install -Uqq optuna
Although I looked into using a rather smooth loss landscape, sometimes it is necessary to visualize the landscape itself. The answer in section B
elaborates on visualization. But, what if you want to use a smoother metric function? Section D
sheds some light on this.
Order of code-execution should be:
C
>> B
>> B.1
>> B.2
>> B.3
>> A.1
>> A.2
>> D
If you create a hiplot (also known as a plot with parallel-coordinates) with all the possible parameter values as mentioned in the search_space
for Section B.2
, and plot the lowest 50 outputs of myFunc2
, it would look like this:
Plotting all such points from the search_space
would look like this:
These figures show that mostly the loss-landscape is flat for any two of the four parameters (ev, bv, vc, dv)
. This could be a reason why, only GridSampler
(which brute-forces the searching process) does better, compared to the other two samplers (TPESampler
and RandomSampler
). Please click on any of the images below to view them enlarged. This could also be the reason why scipy.optimize.minimize(method="L-BFGS-B")
fails right off the bat.
# Create contour plots for parameter-pairs
study_name = "GridSampler"
study = studies.get(study_name)
views = [("dv", "vc"), ("dv", "bv"), ("dv", "ev"),
("bv", "ev"), ("vc", "ev"), ("vc", "bv")]
for i, (x, y) in enumerate(views):
print(f"Figure: {i}/{len(views)}")
study_contour_plot(study=study, params=(x, y))
study_name = "GridSampler"
study = studies.get(study_name)
fig = optuna.visualization.plot_param_importances(study)
fig.update_layout(title=f'Hyperparameter Importances: {study.study_name}',
autosize=False,
width=800, height=500,
margin=dict(l=65, r=50, b=65, t=90))
fig.show()
Section B.3.
finds the lowest metric -88.333
for:
{'ev': 0.2, 'bv': 500.0, 'vc': 222.2222, 'dv': 0.0}
import warnings
from functools import partial
from typing import Iterable, Optional, Callable, List
import pandas as pd
import numpy as np
import optuna
from tqdm.notebook import tqdm
warnings.filterwarnings("ignore", category=optuna.exceptions.ExperimentalWarning)
optuna.logging.set_verbosity(optuna.logging.WARNING)
PARAM_NAMES: List[str] = ["ev", "bv", "vc", "dv",]
DEFAULT_METRIC_FUNC: Callable = myFunc2
def myFunc2(params):
"""myFunc metric v2 with lesser steps."""
global df # define as a global variable
(ev, bv, vc, dv) = params
df['Number'] = (df['Dividend2'] <= vc) * 1 + (df['EV2'] <= dv) * 1
df['Return'] = (
(df['EV'] > ev)
* (df['Probability'] < bv)
* (df['Number'] * df['Dividend'] - (vc + dv))
)
return -1 * (df['Return'].sum())
def make_param_grid(
bounds: List[Tuple[float, float]],
param_names: Optional[List[str]]=None,
num_points: int=10,
as_dict: bool=True,
) -> Union[pd.DataFrame, Dict[str, List[float]]]:
"""
Create parameter search space.
Example:
grid = make_param_grid(bounds=b1, num_points=10, as_dict=True)
"""
if param_names is None:
param_names = PARAM_NAMES # ["ev", "bv", "vc", "dv"]
bounds = np.array(bounds)
grid = np.linspace(start=bounds[:,0],
stop=bounds[:,1],
num=num_points,
endpoint=True,
axis=0)
grid = pd.DataFrame(grid, columns=param_names)
if as_dict:
grid = grid.to_dict()
for k,v in grid.items():
grid.update({k: list(v.values())})
return grid
def objective(trial,
bounds: Optional[Iterable]=None,
func: Optional[Callable]=None,
param_names: Optional[List[str]]=None):
"""Objective function, necessary for optimizing with optuna."""
if param_names is None:
param_names = PARAM_NAMES
if (bounds is None):
bounds = ((-10, 10) for _ in param_names)
if not isinstance(bounds, dict):
bounds = dict((p, (min(b), max(b)))
for p, b in zip(param_names, bounds))
if func is None:
func = DEFAULT_METRIC_FUNC
params = dict(
(p, trial.suggest_float(p, bounds.get(p)[0], bounds.get(p)[1]))
for p in param_names
)
# x = trial.suggest_float('x', -10, 10)
return func((params[p] for p in param_names))
def optimize(objective: Callable,
sampler: Optional[optuna.samplers.BaseSampler]=None,
func: Optional[Callable]=None,
n_trials: int=2,
study_direction: str="minimize",
study_name: Optional[str]=None,
formatstr: str=".4f",
verbose: bool=True):
"""Optimizing function using optuna: creates a study."""
if func is None:
func = DEFAULT_METRIC_FUNC
study = optuna.create_study(
direction=study_direction,
sampler=sampler,
study_name=study_name)
study.optimize(
objective,
n_trials=n_trials,
show_progress_bar=True,
n_jobs=1,
)
if verbose:
metric = eval_metric(study.best_params, func=myFunc2)
msg = format_result(study.best_params, metric,
header=study.study_name,
format=formatstr)
print(msg)
return study
def format_dict(d: Dict[str, float], format: str=".4f") -> Dict[str, float]:
"""
Returns formatted output for a dictionary with
string keys and float values.
"""
return dict((k, float(f'{v:{format}}')) for k,v in d.items())
def format_result(d: Dict[str, float],
metric_value: float,
header: str='',
format: str=".4f"):
"""Returns formatted result."""
msg = f"""Study Name: {header}\n{'='*30}
✅ study.best_params: \n\t{format_dict(d)}
✅ metric: {metric_value}
"""
return msg
def study_contour_plot(study: optuna.Study,
params: Optional[List[str]]=None,
width: int=560,
height: int=500):
"""
Create contour plots for a study, given a list or
tuple of two parameter names.
"""
if params is None:
params = ["dv", "vc"]
fig = optuna.visualization.plot_contour(study, params=params)
fig.update_layout(
title=f'Contour Plot: {study.study_name} ({params[0]}, {params[1]})',
autosize=False,
width=width,
height=height,
margin=dict(l=65, r=50, b=65, t=90))
fig.show()
bounds = [(0.2, 4), (300, 600), (0, 1000), (0, 1000)]
param_names = PARAM_NAMES # ["ev", "bv", "vc", "dv",]
pobjective = partial(objective, bounds=bounds)
# Create an empty dict to contain
# various subsequent studies.
studies = dict()
Optuna comes with a few different types of Samplers. Samplers provide the strategy of how optuna is going to sample points from the parametr-space and evaluate the objective function.
TPESampler
from optuna.samplers import TPESampler
sampler = TPESampler(seed=42)
study_name = "TPESampler"
studies[study_name] = optimize(
pobjective,
sampler=sampler,
n_trials=100,
study_name=study_name,
)
# Study Name: TPESampler
# ==============================
#
# ✅ study.best_params:
# {'ev': 1.6233, 'bv': 585.2143, 'vc': 731.9939, 'dv': 598.6585}
# ✅ metric: -0.0
GridSampler
GridSampler requires a parameter search grid. Here we are using the following search_space
.
from optuna.samplers import GridSampler
# create search-space
search_space = make_param_grid(bounds=bounds, num_points=10, as_dict=True)
sampler = GridSampler(search_space)
study_name = "GridSampler"
studies[study_name] = optimize(
pobjective,
sampler=sampler,
n_trials=2000,
study_name=study_name,
)
# Study Name: GridSampler
# ==============================
#
# ✅ study.best_params:
# {'ev': 0.2, 'bv': 500.0, 'vc': 222.2222, 'dv': 0.0}
# ✅ metric: -88.33333333333337
RandomSampler
from optuna.samplers import RandomSampler
sampler = RandomSampler(seed=42)
study_name = "RandomSampler"
studies[study_name] = optimize(
pobjective,
sampler=sampler,
n_trials=300,
study_name=study_name,
)
# Study Name: RandomSampler
# ==============================
#
# ✅ study.best_params:
# {'ev': 1.6233, 'bv': 585.2143, 'vc': 731.9939, 'dv': 598.6585}
# ✅ metric: -0.0
For the sake of reproducibility, I am keeping a record of the dummy data used here.
import pandas as pd
import numpy as np
from scipy import optimize
cols = {
'Dividend2': [9390, 7448, 177],
'Probability': [341, 376, 452],
'EV': [0.53, 0.60, 0.55],
'Dividend': [185, 55, 755],
'EV2': [123, 139, 544],
}
df = pd.DataFrame(cols)
def myFunc(params):
"""myFunc metric."""
(ev, bv, vc, dv) = params
df['Number'] = np.where(df['Dividend2'] <= vc, 1, 0) \
+ np.where(df['EV2'] <= dv, 1, 0)
df['Return'] = np.where(
df['EV'] <= ev, 0, np.where(
df['Probability'] >= bv, 0, df['Number'] * df['Dividend'] - (vc + dv)
)
)
return -1 * (df['Return'].sum())
b1 = [(0.2,4), (300,600), (0,1000), (0,1000)]
start = [0.2, 600, 1000, 1000]
result = optimize.minimize(fun=myFunc, bounds=b1, x0=start)
print(result)
So, it seems at first glance that the code executed properly and did not throw any error. It says it had success in finding the minimized solution.
fun: -0.0
hess_inv: <4x4 LbfgsInvHessProduct with dtype=float64>
jac: array([0., 0., 3., 3.])
message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL' # 💡
nfev: 35
nit: 2
status: 0
success: True
x: array([2.e-01, 6.e+02, 0.e+00, 0.e+00]) # 🔥
A close observation reveals that the solution (see 🔥) is no different from the starting point [0.2, 600, 1000, 1000]
. So, seems like nothing really happened and the algorithm just finished prematurely?!!
Now look at the message
above (see 💡). If we run a google search on this, you could find something like this:
Summary
b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
If the loss-landscape does not have a smoothely changing topography, the gradient descent algorithms will soon find that from one iteration to the next, there isn't much change happening and hence, will terminate further seeking. Also, if the loss-landscape is rather flat, this could see similar fate and get early-termination.
A binary evaluation of value = 1 if x>5 else 0
is essentially a step-function that assigns 1
for all values of x
that are greater than 5
and 0
otherwise. But this introduces a kink - a discontinuity in smoothness and this could potentially introduce problems in traversing the loss-landscape.
What if we use a sigmoid
function to introduce some smoothness?
# Define sigmoid function
def sigmoid(x):
"""Sigmoid function."""
return 1 / (1 + np.exp(-x))
For the above example, we could modify it as follows.
You can additionally introduce another factor (gamma
: γ) as follows and try to optimize it to make the landscape smoother. Thus by controlling the gamma
factor, you could make the function smoother and change how quickly it changes around x = 5
The above figure is created with the following code-snippet.
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'svg' # 'svg', 'retina'
plt.style.use('seaborn-white')
def make_figure(figtitle: str="Sigmoid Function"):
"""Make the demo figure for using sigmoid."""
x = np.arange(-20, 20.01, 0.01)
y1 = sigmoid(x)
y2 = sigmoid(x - 5)
y3 = sigmoid((x - 5)/3)
y4 = sigmoid((x - 5)/0.3)
fig, ax = plt.subplots(figsize=(10,5))
plt.sca(ax)
plt.plot(x, y1, ls="-", label="$\sigma(x)$")
plt.plot(x, y2, ls="--", label="$\sigma(x - 5)$")
plt.plot(x, y3, ls="-.", label="$\sigma((x - 5) / 3)$")
plt.plot(x, y4, ls=":", label="$\sigma((x - 5) / 0.3)$")
plt.axvline(x=0, ls="-", lw=1.3, color="cyan", alpha=0.9)
plt.axvline(x=5, ls="-", lw=1.3, color="magenta", alpha=0.9)
plt.legend()
plt.title(figtitle)
plt.show()
make_figure()
The following is an example of how you could apply function smoothing.
from functools import partial
def sig(x, gamma: float=1.):
return sigmoid(x/gamma)
def myFunc3(params, gamma: float=0.5):
"""myFunc metric v3 with smoother metric."""
(ev, bv, vc, dv) = params
_sig = partial(sig, gamma=gamma)
df['Number'] = _sig(x = -(df['Dividend2'] - vc)) * 1 \
+ _sig(x = -(df['EV2'] - dv)) * 1
df['Return'] = (
_sig(x = df['EV'] - ev)
* _sig(x = -(df['Probability'] - bv))
* _sig(x = df['Number'] * df['Dividend'] - (vc + dv))
)
return -1 * (df['Return'].sum())
Upvotes: 3
Reputation: 7157
As already mentioned in my comment, the crucial problem is that np.where()
is neither differentiable nor continuous. Consequently, your objective function violates the mathematical assumptions for most of the (derivate-based) algorithms under the hood of scipy.optimize.minimize
.
So, basically, you've got three options:
np.where()
with a smooth approximation such that your objective is continuously differentiable.Since @CypherX answer pursues approach 1, I'd like to focus on 2. Here, the main idea is to approximate the np.where
function. One possible approximation is
def smooth_if_then(x):
eps = 1e-12
return 0.5 + x/(2*np.sqrt(eps + x*x))
which is continuous and differentiable. Then, given a np.ndarray arr
and a scalar value x
, the expression np.where(arr <= x, 1, 0)
is equivalent to smooth_if_then(x - arr)
.
Hence, the objective function becomes:
div = df['Dividend'].values
div2 = df['Dividend2'].values
ev2 = df['EV2'].values
ev = df['EV'].values
prob = df['Probability'].values
def objective(x, *params):
ev, bv, vc, dv = x
div_vals, div2_vals, ev2_vals, ev_vals, prob_vals = params
number = smooth_if_then(vc - div2_vals) + smooth_if_then(dv - ev2_vals)
part1 = smooth_if_then(bv - prob_vals) * (number * div_vals - (vc + dv))
part2 = smooth_if_then(-1*(ev - ev_vals)) * part1
return -1 * part2.sum()
and using the trust-constr
algorithm (which is the most robust one inside scipy.optimize.minimize), yields:
res = minimize(lambda x: objective(x, div, div2, ev2, ev, prob), x0=start, bounds=b1, method="trust-constr")
barrier_parameter: 1.0240000000000006e-08
barrier_tolerance: 1.0240000000000006e-08
cg_niter: 5
cg_stop_cond: 0
constr: [array([8.54635975e-01, 5.99253512e+02, 9.95614973e+02, 9.95614973e+02])]
constr_nfev: [0]
constr_nhev: [0]
constr_njev: [0]
constr_penalty: 1.0
constr_violation: 0.0
execution_time: 0.2951819896697998
fun: 1.3046631387761482e-08
grad: array([0.00000000e+00, 0.00000000e+00, 8.92175218e-12, 8.92175218e-12])
jac: [<4x4 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in Compressed Sparse Row format>]
lagrangian_grad: array([-3.60651033e-09, 4.89643010e-09, 2.21847918e-09, 2.21847918e-09])
message: '`gtol` termination condition is satisfied.'
method: 'tr_interior_point'
nfev: 20
nhev: 0
nit: 14
niter: 14
njev: 4
optimality: 4.896430096425101e-09
status: 1
success: True
tr_radius: 478515625.0
v: [array([-3.60651033e-09, 4.89643010e-09, 2.20955743e-09, 2.20955743e-09])]
x: array([8.54635975e-01, 5.99253512e+02, 9.95614973e+02, 9.95614973e+02])
Last but not least: Using smooth approximations is a common way to achieve differentiability. However, it's worth mentioning that these approximations are not convex. In practice, this means that your optimization problem is not convex and thus, you have no guarantee that a found stationary point (local minimizer) is a global optimum. For this end, one either needs to use a global optimization algorithm or formulate the problem as a MIP. The latter is the recommended approach, both from a mathematical and a practice point of view.
Upvotes: 2