Hedwin
Hedwin

Reputation: 73

How to minimize problem using Pulp when the parameter to minimize is evaluated using a function?

I have a class that evaluates a result for a given input list of parameters. I would like to know the best set of parameters that minimises the result given by the class. Since the input must remain integer, I read that Pulp might be a solution compared to Scipy.minimize().

I tried to implement it, but the line prob += ... doesn't work. Here is a simplified version of the code that reproduces the problem I have. It seems that I don't understand something important about Pulp.

import pandas as pd
import pulp as pl

class ComplexClass():
    
    def __init__(self, path_data:str):
        # self.data = pd.read_csv(path_data)
        d = {'col1': [i for i in range(100)], 'col2': [i**2 for i in range(100)]}
        self.df = pd.DataFrame(data=d)
        
    def eval_value(self, param: list):
        self.p1 = param[0]
        self.p2 = param[1]
        
        # [...] Complexe stuff manipulating pandas DataFrame
        self.df['col3'] = self.df['col1'].rolling(window=self.p1).mean()
        self.df['col4'] = self.df['col2'].rolling(window=self.p2).mean()
        self.df['col5'] = self.df['col3']+self.df['col4']
        self.best_value = self.df['col5'].iloc[-1] 
        
def func_to_minimize(input_parm:list, initialized_class):
    initialized_class.eval_value(input_parm)
    return initialized_class.best_value

path = './path2data/data.csv'
my_class = ComplexClass(path_data=path)

# ===== INIT TEST =====
my_param = [2, 10]
print(f'For input {my_param}\n    => Value to minimize = {func_to_minimize(input_parm=my_param, initialized_class=my_class)}')

# ====== OPTIMIZATION WITH PULP =====
# Create the 'prob' variable to contain the problem data
prob = pl.LpProblem("Best_Param", pl.LpMinimize)
    
# The 2 variables Beef and Chicken are created with a lower limit of zero
x1 = pl.LpVariable("param1", lowBound=2, upBound=10, cat=pl.LpInteger)
x2 = pl.LpVariable("param2", lowBound=2, upBound=20, cat=pl.LpInteger)

# The objective function is added to 'prob' first
prob += func_to_minimize([x1, x2], my_class)    # <= doesn't work

# The problem data is written to an .lp file
prob.writeLP("Best_Param.lp")

# The problem is solved using PuLP's choice of Solver
prob.solve()

The error message is when evaluating self.df['col3'] = self.df['col1'].rolling(window=self.p1).mean() and is: ValueError: window must be an integer 0 or greater. Indeed, the self.p1 used in window is a LpVariable and not an integer.

How to minimize a problem using Pulp when the result to be minimised must be evaluated by a function and not by a line?

Thank you.

Upvotes: 0

Views: 68

Answers (1)

Ingwersen_erik
Ingwersen_erik

Reputation: 2263

The problem is that you're trying to use PuLP, which is an linear programming (LP) optimization library with a function that involves operations that aren't inherently linear or integer-programmable. PuLP is specifically designed for linear programming and integer programming, meaning that the objective function and constraints must be formulated as linear (or affine) relationships. Here's how you can address the issue you're encountering:

Understanding the Problem

PuLP can't directly handle objective functions that are evaluated by non-linear operations or involve calls to other Python functions or methods that manipulate data dynamically, such as rolling window calculations in pandas. PuLP variables (LpVariable) are not actual numbers until after the optimization problem is solved. Therefore, they can't be used directly in calculations or functions that require concrete numeric inputs during problem definition.

Possible Solution

Given that you need to optimize parameters based on evaluations done by a class method that processes data dynamically, one possible approach would be to use metaheuristic algorithms such as Genetic Algorithms, Simulated Annealing, or others that do not require the gradient of the objective function or even its linearity. Libraries like deap, simanneal, or pyeasyga can be useful here. These algorithms can handle arbitrary objective functions and can work with integer constraints.

Example Implementation Using a Genetic Algorithm

Here's a conceptual outline using a genetic algorithm (via the deap library) to handle the problem:

NOTE: Make sure you have the deap library installed, or install it using pip install deap before running the example below.

import random
import pandas as pd
from deap import base, creator, tools, algorithms

class ComplexClass():

    def __init__(self, path_data:str):
        # self.data = pd.read_csv(path_data)
        d = {'col1': [i for i in range(100)], 'col2': [i**2 for i in range(100)]}
        self.df = pd.DataFrame(data=d)
        
    def eval_value(self, param: list):
        self.p1 = param[0]
        self.p2 = param[1]
        
        # [...] Complexe stuff manipulating pandas DataFrame
        self.df['col3'] = self.df['col1'].rolling(window=self.p1).mean()
        self.df['col4'] = self.df['col2'].rolling(window=self.p2).mean()
        self.df['col5'] = self.df['col3'] + self.df['col4']
        self.best_value = self.df['col5'].iloc[-1]


def func_to_minimize(input_parm:list, initialized_class):
    initialized_class.eval_value(input_parm)
    return initialized_class.best_value


# Define the evaluation function for the genetic algorithm
def evalGA(params):
    result = func_to_minimize(params, my_class)
    return (result,)


# Setting up the genetic algorithm
my_class = ComplexClass("")
creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
creator.create("Individual", list, fitness=creator.FitnessMin)

toolbox = base.Toolbox()

# Register a function named "attribute_param1" that generates random numbers
# between 2 and 10. This function will be used to generate values for the
# variable "param1"
toolbox.register("attribute_param1", random.randint, 2, 10)

# Do the same process to register a function that generates values for the
# variable "param2". Change only the interval that for "param2" should be
# between 2 and 20
toolbox.register("attribute_param2", random.randint, 2, 20)

# Register another function that is equivalent to generating values for
# "param1" and "param2" one time
toolbox.register("individual", tools.initCycle, creator.Individual,
                 (toolbox.attribute_param1, toolbox.attribute_param2),
                 n=1)

toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("evaluate", evalGA)
toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", tools.mutUniformInt, low=[2, 2], up=[10, 20], indpb=0.2)
toolbox.register("select", tools.selTournament, tournsize=3)

# Create population and run the genetic algorithm
population = toolbox.population(n=50)
result = algorithms.eaSimple(population, toolbox, cxpb=0.5, mutpb=0.2, ngen=40, verbose=True)

best_individual = tools.selBest(population, 1)[0]

print('Best Parameters:', best_individual)
print('Best Score:', best_individual.fitness.values[0])
# Prints:
# 
# Best Parameters: [10, 20]
# Best Score: 8138.0

This approach directly evaluates your class method within the genetic algorithm's fitness function, allowing for any complexity in the computation that you need.

You can find more information on how to use deap here: https://deap.readthedocs.io/en

Upvotes: 0

Related Questions