OldSport
OldSport

Reputation: 137

Find the value of variables to maximize return of function in Python

I'd want to achieve similar result as how the Solver-function in Excel is working. I've been reading of Scipy optimization and been trying to build a function which outputs what I would like to find the maximal value of. The equation is based on four different variables which, see my code below:

import pandas as pd
import numpy as np
from scipy import optimize

cols = {
    'Dividend2': [9390, 7448, 177], 
    'Probability': [341, 376, 452], 
    'EV': [0.53, 0.60, 0.55], 
    'Dividend': [185, 55, 755], 
    'EV2': [123, 139, 544],
}

df = pd.DataFrame(cols)

def myFunc(params):
    """myFunc metric."""
    (ev, bv, vc, dv) = params
    df['Number'] = np.where(df['Dividend2'] <= vc, 1, 0) \
                    + np.where(df['EV2'] <= dv, 1, 0)
    df['Return'] =  np.where(
        df['EV'] <= ev, 0, np.where(
            df['Probability'] >= bv, 0, df['Number'] * df['Dividend'] - (vc + dv)
        )
    )
    return -1 * (df['Return'].sum())

b1 = [(0.2,4), (300,600), (0,1000), (0,1000)]
start = [0.2, 600, 1000, 1000]
result = optimize.minimize(fun=myFunc, bounds=b1, x0=start)
print(result)

So I'd like to find the maximum value of the column Return in df when changing the variables ev,bv,vc & dv. I'd like them to be between in the intervals of ev: 0.2-4, bv: 300-600, vc: 0-1000 & dv: 0-1000.

When running my code it seem like the function stops at x0.

Upvotes: 7

Views: 2091

Answers (2)

CypherX
CypherX

Reputation: 7353

Solution

I will use optuna library to give you a solution to the type of problem you are trying to solve. I have tried using scipy.optimize.minimize and it appears that the loss-landscape is probably quite flat in most places, and hence the tolerances enforce the minimizing algorithm (L-BFGS-B) to stop prematurely.

With optuna, it rather straight forward. Optuna only requires an objective function and a study. The study send various trials to the objective function, which in turn, evaluates the metric of your choice.

I have defined another metric function myFunc2 by mostly removing the np.where calls, as you can do-away with them (reduces number of steps) and make the function slightly faster.

# install optuna with pip
pip install -Uqq optuna

Although I looked into using a rather smooth loss landscape, sometimes it is necessary to visualize the landscape itself. The answer in section B elaborates on visualization. But, what if you want to use a smoother metric function? Section D sheds some light on this.

Order of code-execution should be:

  • Sections: C >> B >> B.1 >> B.2 >> B.3 >> A.1 >> A.2 >> D

A. Building Intuition

If you create a hiplot (also known as a plot with parallel-coordinates) with all the possible parameter values as mentioned in the search_space for Section B.2, and plot the lowest 50 outputs of myFunc2, it would look like this:

hiplot-search-space-subset

Plotting all such points from the search_space would look like this:

hiplot-search-space-full

A.1. Loss Landscape Views for Various Parameter-Pairs

These figures show that mostly the loss-landscape is flat for any two of the four parameters (ev, bv, vc, dv). This could be a reason why, only GridSampler (which brute-forces the searching process) does better, compared to the other two samplers (TPESampler and RandomSampler). Please click on any of the images below to view them enlarged. This could also be the reason why scipy.optimize.minimize(method="L-BFGS-B") fails right off the bat.

01-dv-vc

01. dv-vc
02-dv-bv

02. dv-bv
03-dv-ev

03. dv-ev
04-bv-ev

04. bv-ev
05-cv-ev

05. cv-ev
06-vc-bv

06. vc-bv
# Create contour plots for parameter-pairs
study_name = "GridSampler"
study = studies.get(study_name)

views = [("dv", "vc"), ("dv", "bv"), ("dv", "ev"), 
         ("bv", "ev"), ("vc", "ev"), ("vc", "bv")]

for i, (x, y) in enumerate(views):
    print(f"Figure: {i}/{len(views)}")
    study_contour_plot(study=study, params=(x, y))

A.2. Parameter Importance

07-param-importance

study_name = "GridSampler"
study = studies.get(study_name)

fig = optuna.visualization.plot_param_importances(study)
fig.update_layout(title=f'Hyperparameter Importances: {study.study_name}', 
                  autosize=False,
                  width=800, height=500,
                  margin=dict(l=65, r=50, b=65, t=90))
fig.show()

B. Code

Section B.3. finds the lowest metric -88.333 for:

  • {'ev': 0.2, 'bv': 500.0, 'vc': 222.2222, 'dv': 0.0}
import warnings
from functools import partial
from typing import Iterable, Optional, Callable, List

import pandas as pd
import numpy as np
import optuna
from tqdm.notebook import tqdm

warnings.filterwarnings("ignore", category=optuna.exceptions.ExperimentalWarning)
optuna.logging.set_verbosity(optuna.logging.WARNING)

PARAM_NAMES: List[str] = ["ev", "bv", "vc", "dv",]
DEFAULT_METRIC_FUNC: Callable = myFunc2


def myFunc2(params):
    """myFunc metric v2 with lesser steps."""
    global df # define as a global variable
    (ev, bv, vc, dv) = params
    df['Number'] = (df['Dividend2'] <= vc) * 1 + (df['EV2'] <= dv) * 1
    df['Return'] =  (
        (df['EV'] > ev) 
        * (df['Probability'] < bv) 
        * (df['Number'] * df['Dividend'] - (vc + dv))
    )
    return -1 * (df['Return'].sum())


def make_param_grid(
        bounds: List[Tuple[float, float]], 
        param_names: Optional[List[str]]=None, 
        num_points: int=10, 
        as_dict: bool=True,
    ) -> Union[pd.DataFrame, Dict[str, List[float]]]:
    """
    Create parameter search space.

    Example:
    
        grid = make_param_grid(bounds=b1, num_points=10, as_dict=True)
    
    """
    if param_names is None:
        param_names = PARAM_NAMES # ["ev", "bv", "vc", "dv"]
    bounds = np.array(bounds)
    grid = np.linspace(start=bounds[:,0], 
                       stop=bounds[:,1], 
                       num=num_points, 
                       endpoint=True, 
                       axis=0)
    grid = pd.DataFrame(grid, columns=param_names)
    if as_dict:
        grid = grid.to_dict()
        for k,v in grid.items():
            grid.update({k: list(v.values())})
    return grid


def objective(trial, 
              bounds: Optional[Iterable]=None, 
              func: Optional[Callable]=None, 
              param_names: Optional[List[str]]=None):
    """Objective function, necessary for optimizing with optuna."""
    if param_names is None:
        param_names = PARAM_NAMES
    if (bounds is None):
        bounds = ((-10, 10) for _ in param_names)
    if not isinstance(bounds, dict):
        bounds = dict((p, (min(b), max(b))) 
                        for p, b in zip(param_names, bounds))
    if func is None:
        func = DEFAULT_METRIC_FUNC

    params = dict(
        (p, trial.suggest_float(p, bounds.get(p)[0], bounds.get(p)[1])) 
        for p in param_names        
    )
    # x = trial.suggest_float('x', -10, 10)
    return func((params[p] for p in param_names))


def optimize(objective: Callable, 
             sampler: Optional[optuna.samplers.BaseSampler]=None, 
             func: Optional[Callable]=None, 
             n_trials: int=2, 
             study_direction: str="minimize",
             study_name: Optional[str]=None,
             formatstr: str=".4f",
             verbose: bool=True):
    """Optimizing function using optuna: creates a study."""
    if func is None:
        func = DEFAULT_METRIC_FUNC
    study = optuna.create_study(
        direction=study_direction, 
        sampler=sampler, 
        study_name=study_name)
    study.optimize(
        objective, 
        n_trials=n_trials, 
        show_progress_bar=True, 
        n_jobs=1,
    )
    if verbose:
        metric = eval_metric(study.best_params, func=myFunc2)
        msg = format_result(study.best_params, metric, 
                            header=study.study_name, 
                            format=formatstr)
        print(msg)
    return study


def format_dict(d: Dict[str, float], format: str=".4f") -> Dict[str, float]:
    """
    Returns formatted output for a dictionary with 
    string keys and float values.
    """
    return dict((k, float(f'{v:{format}}')) for k,v in d.items())


def format_result(d: Dict[str, float], 
                  metric_value: float, 
                  header: str='', 
                  format: str=".4f"): 
    """Returns formatted result."""
    msg = f"""Study Name: {header}\n{'='*30}
    
    ✅ study.best_params: \n\t{format_dict(d)}
    ✅ metric: {metric_value} 
    """
    return msg


def study_contour_plot(study: optuna.Study, 
                       params: Optional[List[str]]=None, 
                       width: int=560, 
                       height: int=500):
    """
    Create contour plots for a study, given a list or 
    tuple of two parameter names.
    """
    if params is None:
        params = ["dv", "vc"]
    fig = optuna.visualization.plot_contour(study, params=params)
    fig.update_layout(
        title=f'Contour Plot: {study.study_name} ({params[0]}, {params[1]})', 
        autosize=False,
        width=width, 
        height=height,
        margin=dict(l=65, r=50, b=65, t=90))
    fig.show()


bounds = [(0.2, 4), (300, 600), (0, 1000), (0, 1000)]
param_names = PARAM_NAMES # ["ev", "bv", "vc", "dv",]
pobjective = partial(objective, bounds=bounds)

# Create an empty dict to contain 
# various subsequent studies.
studies = dict()

Optuna comes with a few different types of Samplers. Samplers provide the strategy of how optuna is going to sample points from the parametr-space and evaluate the objective function.

B.1 Use TPESampler

from optuna.samplers import TPESampler

sampler = TPESampler(seed=42)

study_name = "TPESampler"
studies[study_name] = optimize(
    pobjective, 
    sampler=sampler, 
    n_trials=100, 
    study_name=study_name,
)

# Study Name: TPESampler
# ==============================
#    
#     ✅ study.best_params: 
#   {'ev': 1.6233, 'bv': 585.2143, 'vc': 731.9939, 'dv': 598.6585}
#     ✅ metric: -0.0 

B.2. Use GridSampler

GridSampler requires a parameter search grid. Here we are using the following search_space.

search_space

from optuna.samplers import GridSampler

# create search-space
search_space = make_param_grid(bounds=bounds, num_points=10, as_dict=True)

sampler = GridSampler(search_space)

study_name = "GridSampler"
studies[study_name] = optimize(
    pobjective, 
    sampler=sampler, 
    n_trials=2000, 
    study_name=study_name,
)

# Study Name: GridSampler
# ==============================
#    
#     ✅ study.best_params: 
#   {'ev': 0.2, 'bv': 500.0, 'vc': 222.2222, 'dv': 0.0}
#     ✅ metric: -88.33333333333337 

B.3. Use RandomSampler

from optuna.samplers import RandomSampler

sampler = RandomSampler(seed=42)

study_name = "RandomSampler"
studies[study_name] = optimize(
    pobjective, 
    sampler=sampler, 
    n_trials=300, 
    study_name=study_name,
)

# Study Name: RandomSampler
# ==============================
#    
#     ✅ study.best_params: 
#   {'ev': 1.6233, 'bv': 585.2143, 'vc': 731.9939, 'dv': 598.6585}
#     ✅ metric: -0.0 

C. Dummy Data

For the sake of reproducibility, I am keeping a record of the dummy data used here.

import pandas as pd
import numpy as np
from scipy import optimize

cols = {
    'Dividend2': [9390, 7448, 177], 
    'Probability': [341, 376, 452], 
    'EV': [0.53, 0.60, 0.55], 
    'Dividend': [185, 55, 755], 
    'EV2': [123, 139, 544],
}

df = pd.DataFrame(cols)

def myFunc(params):
    """myFunc metric."""
    (ev, bv, vc, dv) = params
    df['Number'] = np.where(df['Dividend2'] <= vc, 1, 0) \
                    + np.where(df['EV2'] <= dv, 1, 0)
    df['Return'] =  np.where(
        df['EV'] <= ev, 0, np.where(
            df['Probability'] >= bv, 0, df['Number'] * df['Dividend'] - (vc + dv)
        )
    )
    return -1 * (df['Return'].sum())

b1 = [(0.2,4), (300,600), (0,1000), (0,1000)]
start = [0.2, 600, 1000, 1000]
result = optimize.minimize(fun=myFunc, bounds=b1, x0=start)
print(result)

C.1. An Observation

So, it seems at first glance that the code executed properly and did not throw any error. It says it had success in finding the minimized solution.

      fun: -0.0
 hess_inv: <4x4 LbfgsInvHessProduct with dtype=float64>
      jac: array([0., 0., 3., 3.])
  message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL' # 💡
     nfev: 35
      nit: 2
   status: 0
  success: True
        x: array([2.e-01, 6.e+02, 0.e+00, 0.e+00]) # 🔥

A close observation reveals that the solution (see 🔥) is no different from the starting point [0.2, 600, 1000, 1000]. So, seems like nothing really happened and the algorithm just finished prematurely?!!

Now look at the message above (see 💡). If we run a google search on this, you could find something like this:

D. Making the Loss Landscape Smoother

A binary evaluation of value = 1 if x>5 else 0 is essentially a step-function that assigns 1 for all values of x that are greater than 5 and 0 otherwise. But this introduces a kink - a discontinuity in smoothness and this could potentially introduce problems in traversing the loss-landscape.

What if we use a sigmoid function to introduce some smoothness?

# Define sigmoid function
def sigmoid(x):
    """Sigmoid function."""
    return 1 / (1 + np.exp(-x))

For the above example, we could modify it as follows.

You can additionally introduce another factor (gamma: γ) as follows and try to optimize it to make the landscape smoother. Thus by controlling the gamma factor, you could make the function smoother and change how quickly it changes around x = 5

sigmoid-demo

The above figure is created with the following code-snippet.

import matplotlib.pyplot as plt

%matplotlib inline 
%config InlineBackend.figure_format = 'svg' # 'svg', 'retina' 
plt.style.use('seaborn-white')

def make_figure(figtitle: str="Sigmoid Function"):
    """Make the demo figure for using sigmoid."""

    x = np.arange(-20, 20.01, 0.01)
    y1 = sigmoid(x)
    y2 = sigmoid(x - 5)
    y3 = sigmoid((x - 5)/3)
    y4 = sigmoid((x - 5)/0.3)
    fig, ax = plt.subplots(figsize=(10,5))
    plt.sca(ax)
    plt.plot(x, y1, ls="-", label="$\sigma(x)$")
    plt.plot(x, y2, ls="--", label="$\sigma(x - 5)$")
    plt.plot(x, y3, ls="-.", label="$\sigma((x - 5) / 3)$")
    plt.plot(x, y4, ls=":", label="$\sigma((x - 5) / 0.3)$")
    plt.axvline(x=0, ls="-", lw=1.3, color="cyan", alpha=0.9)
    plt.axvline(x=5, ls="-", lw=1.3, color="magenta", alpha=0.9)
    plt.legend()
    plt.title(figtitle)
    plt.show()

make_figure()

D.1. Example of Metric Smoothing

The following is an example of how you could apply function smoothing.

from functools import partial

def sig(x, gamma: float=1.):
    return sigmoid(x/gamma)

def myFunc3(params, gamma: float=0.5):
    """myFunc metric v3 with smoother metric."""
    (ev, bv, vc, dv) = params
    _sig = partial(sig, gamma=gamma)
    df['Number'] = _sig(x = -(df['Dividend2'] - vc)) * 1 \
                    + _sig(x = -(df['EV2'] - dv)) * 1
    df['Return'] = (
        _sig(x = df['EV'] - ev) 
        * _sig(x = -(df['Probability'] - bv))
        * _sig(x = df['Number'] * df['Dividend'] - (vc + dv))
    )
    return -1 * (df['Return'].sum())

Upvotes: 3

joni
joni

Reputation: 7157

As already mentioned in my comment, the crucial problem is that np.where() is neither differentiable nor continuous. Consequently, your objective function violates the mathematical assumptions for most of the (derivate-based) algorithms under the hood of scipy.optimize.minimize.

So, basically, you've got three options:

  1. Use a derivative-free algorithm and hope for the best.
  2. Replace np.where() with a smooth approximation such that your objective is continuously differentiable.
  3. Reformulate your problem as a MIP.

Since @CypherX answer pursues approach 1, I'd like to focus on 2. Here, the main idea is to approximate the np.where function. One possible approximation is

def smooth_if_then(x):
    eps = 1e-12
    return 0.5 + x/(2*np.sqrt(eps + x*x))

which is continuous and differentiable. Then, given a np.ndarray arr and a scalar value x, the expression np.where(arr <= x, 1, 0) is equivalent to smooth_if_then(x - arr).

Hence, the objective function becomes:

div = df['Dividend'].values
div2 = df['Dividend2'].values
ev2 = df['EV2'].values
ev = df['EV'].values
prob = df['Probability'].values

def objective(x, *params):
    ev, bv, vc, dv = x
    div_vals, div2_vals, ev2_vals, ev_vals, prob_vals = params
    number = smooth_if_then(vc - div2_vals) + smooth_if_then(dv - ev2_vals)
    part1 = smooth_if_then(bv - prob_vals) * (number * div_vals - (vc + dv))
    part2 = smooth_if_then(-1*(ev - ev_vals)) * part1
    return -1 * part2.sum()

and using the trust-constr algorithm (which is the most robust one inside scipy.optimize.minimize), yields:

res = minimize(lambda x: objective(x, div, div2, ev2, ev, prob), x0=start, bounds=b1, method="trust-constr")
 barrier_parameter: 1.0240000000000006e-08
 barrier_tolerance: 1.0240000000000006e-08
          cg_niter: 5
      cg_stop_cond: 0
            constr: [array([8.54635975e-01, 5.99253512e+02, 9.95614973e+02, 9.95614973e+02])]
       constr_nfev: [0]
       constr_nhev: [0]
       constr_njev: [0]
    constr_penalty: 1.0
  constr_violation: 0.0
    execution_time: 0.2951819896697998
               fun: 1.3046631387761482e-08
              grad: array([0.00000000e+00, 0.00000000e+00, 8.92175218e-12, 8.92175218e-12])
               jac: [<4x4 sparse matrix of type '<class 'numpy.float64'>'
    with 4 stored elements in Compressed Sparse Row format>]
   lagrangian_grad: array([-3.60651033e-09,  4.89643010e-09,  2.21847918e-09,  2.21847918e-09])
           message: '`gtol` termination condition is satisfied.'
            method: 'tr_interior_point'
              nfev: 20
              nhev: 0
               nit: 14
             niter: 14
              njev: 4
        optimality: 4.896430096425101e-09
            status: 1
           success: True
         tr_radius: 478515625.0
                 v: [array([-3.60651033e-09,  4.89643010e-09,  2.20955743e-09,  2.20955743e-09])]
                 x: array([8.54635975e-01, 5.99253512e+02, 9.95614973e+02, 9.95614973e+02])

Last but not least: Using smooth approximations is a common way to achieve differentiability. However, it's worth mentioning that these approximations are not convex. In practice, this means that your optimization problem is not convex and thus, you have no guarantee that a found stationary point (local minimizer) is a global optimum. For this end, one either needs to use a global optimization algorithm or formulate the problem as a MIP. The latter is the recommended approach, both from a mathematical and a practice point of view.

Upvotes: 2

Related Questions