Reputation: 369
I have a large, black-box model, which I am trying to calibrate, and I am trying to implement a basic machine-learning algorithm to assist the calibration, but I am getting stuck.
scalar
, and generates an output which is a list of float-point values called y_pred
.scalar
so that the output, y_pred
is as close as possible to a known set of values called y_true
.y_pred
and y_true
via the mean squared error.scalar
, and I want to minimise the mean squared errorThe method I have been using to solve this, is as follows:
from sklearn.metrics import mean_squared_error
y_true = [1, 2, 3, 4]
list_of_scalars = []
list_of_results = []
for i in range(0, 2, 0.01):
scalar = i
list_of_scalars.append(scalar)
y_pred = BlackBox.run(scalar)
mse = mean_squared_error(y_true, y_pred)
list_of_results.append(mse)
best_value = min(mse)
best_value_index = list_of_mse.index(best_value)
the_best_input = list_of_scalars[best_value_index]
This seems like a bad method, because it always takes the same amount of time, and assumes in advance that I know the best range that scalar
will occupy. I could fine tune this method by trying to fit a line and retrieving the minimum value, but I'd still have these problems.
It seems that some kind of machine learning algorithm would be a better approach here. However, I'm not sure what type of algorithm would suit this problem? My intuition says a gradient descent, but I've not seen one implemented in this manner. The examples I've seen have a dataset before running the descent, rather than the data being generated on the fly.
My best guess is that such an algorithm would need to be aware of the gradient between the current mean_squared_error, and the previous mean_squared_error, and then adjust how much the scalar would increase or decrease in response to this.
My best guess at mapping this out, is as follows:
from sklearn.metrics import mean_squared_error
y_true = [1, 2, 3, 4]
scalar = 0.01 # Some arbitrarily small scalar value
mse = 9999999 # Some arbitrarily large mse
gradient = 2 # Some arbitrarily large gradient
threshold = 0.001 # The threshold under which the while loop will end
def some_algorithm(gradient, scalar) -> float:
'''
Takes the current gradient, and the current scalar, and determines how much to
adjust the scalar by
'''
...
return adjustment_factor
while gradient > threshold:
y_pred = BlackBox(scalar)
current_mse = mean_squared_error(y_true, y_pred)
gradient = current_mse / mse
adjustment_factor = some_algorithm(gradient, scalar)
scalar *= adjustment_factor
I'm happy to use an out-of-the-box solution such as sklearn
classes, but it is the implementation that I'm getting stuck on.
Upvotes: 1
Views: 149
Reputation: 5024
The problem of ML is creating a model that will predict better given some test datasets. But in your case you already have a model as you said "I have a large, black-box model, which I am trying to calibrate, ..."
To create a better model try the following algorithm.
y_true = [1, 2, 3, 4]
x_test = [0.1, 0.2, 0.3, 0.4] # your input
y_pred = model(x_test)
current_mse = mean_squared_error(y_true, y_pred)
Basic idea behind mse or mean_squared_error.y_pred = [0.9, 1.8, 4.5, 2.8]
error = [1-0.9, 2-1.8, 3-4.5, 4-2.8] # 1, 2, 3 and 4 are from y_true
error = [0.1, 0.2, -1.5, 1.2]
squared_error = [0.01, 0.04, 2.25, 1.44] # 0.1*0.1, 0.2*0.2 (-1.5)*(-1.5) ...
mean_squared_error = sum(squared_error) / len(squared_error) # get average, mean is just an everage.
mean_squared_error = 3.74/4 = 0.935
If you really want to input one at a time.
all_error = []
for testv, truev in zip(x_test, y_true):
pred = BlackBox.run(testv)
error = truev - pred
squared_error = error * error
all_error.append(squared_error)
mse = sum(all_error) / len(all_error)
if best_model is None: # first time
best_mse = current_mse
best_model = current_model
elif current_mse < best_mse:
best_mse = current_mse
best_model = current_model
In the end you will have best_model and best_mse.
Upvotes: 1