rorance_
rorance_

Reputation: 369

Implementing a machine learning algorithm in python when output is generated one-at-a-time

I have a large, black-box model, which I am trying to calibrate, and I am trying to implement a basic machine-learning algorithm to assist the calibration, but I am getting stuck.

The method I have been using to solve this, is as follows:

from sklearn.metrics import mean_squared_error

y_true = [1, 2, 3, 4]
list_of_scalars = []
list_of_results = []

for i in range(0, 2, 0.01):
    scalar = i
    list_of_scalars.append(scalar)
    y_pred = BlackBox.run(scalar)
    mse = mean_squared_error(y_true, y_pred)
    list_of_results.append(mse)

best_value = min(mse)
best_value_index = list_of_mse.index(best_value)
the_best_input = list_of_scalars[best_value_index]

This seems like a bad method, because it always takes the same amount of time, and assumes in advance that I know the best range that scalar will occupy. I could fine tune this method by trying to fit a line and retrieving the minimum value, but I'd still have these problems.

It seems that some kind of machine learning algorithm would be a better approach here. However, I'm not sure what type of algorithm would suit this problem? My intuition says a gradient descent, but I've not seen one implemented in this manner. The examples I've seen have a dataset before running the descent, rather than the data being generated on the fly.

My best guess is that such an algorithm would need to be aware of the gradient between the current mean_squared_error, and the previous mean_squared_error, and then adjust how much the scalar would increase or decrease in response to this.

My best guess at mapping this out, is as follows:

from sklearn.metrics import mean_squared_error

y_true = [1, 2, 3, 4]
scalar = 0.01  # Some arbitrarily small scalar value
mse = 9999999  # Some arbitrarily large mse
gradient = 2  # Some arbitrarily large gradient
threshold = 0.001  # The threshold under which the while loop will end

def some_algorithm(gradient, scalar) -> float:
    '''
    Takes the current gradient, and the current scalar, and determines how much to 
    adjust the scalar by
    '''
    ...
    return adjustment_factor

while gradient > threshold:
    y_pred = BlackBox(scalar)
    current_mse = mean_squared_error(y_true, y_pred)
    gradient = current_mse / mse
    adjustment_factor = some_algorithm(gradient, scalar)
    scalar *= adjustment_factor

I'm happy to use an out-of-the-box solution such as sklearn classes, but it is the implementation that I'm getting stuck on.

Upvotes: 1

Views: 149

Answers (1)

ferdy
ferdy

Reputation: 5024

The problem of ML is creating a model that will predict better given some test datasets. But in your case you already have a model as you said "I have a large, black-box model, which I am trying to calibrate, ..."

To create a better model try the following algorithm.

  1. Define initial best_mse say 0 and an initial model say best_model=None.
  2. Create model (there are different ways to create a model) say current_model.
  3. Test the model with the test datasets and measure the mse as current_mse.
    • This is what you are trying to do but with corrections.
    y_true = [1, 2, 3, 4]
    x_test = [0.1, 0.2, 0.3, 0.4]  # your input
    y_pred = model(x_test)
    current_mse = mean_squared_error(y_true, y_pred)
    
    Basic idea behind mse or mean_squared_error.
    Sample y_pred results:
    y_pred = [0.9, 1.8, 4.5, 2.8]
    error = [1-0.9, 2-1.8, 3-4.5, 4-2.8]  # 1, 2, 3 and 4 are from y_true
    error = [0.1, 0.2, -1.5, 1.2]
    squared_error = [0.01, 0.04, 2.25, 1.44]  # 0.1*0.1, 0.2*0.2 (-1.5)*(-1.5) ...
    mean_squared_error = sum(squared_error) / len(squared_error)  # get average, mean is just an everage.
    mean_squared_error = 3.74/4 = 0.935
    
    If you really want to input one at a time.
    all_error = []
    for testv, truev in zip(x_test, y_true):
        pred = BlackBox.run(testv)
        error = truev - pred
        squared_error = error * error
        all_error.append(squared_error)
    mse = sum(all_error) / len(all_error)
    
  4. If current_mse is better than best_mse, set best_mse to current_mse or
if best_model is None:  # first time
    best_mse = current_mse
    best_model = current_model
elif current_mse < best_mse:
    best_mse = current_mse
    best_model = current_model

In the end you will have best_model and best_mse.

Upvotes: 1

Related Questions