Reputation: 189
Consider a table that is created using the following code:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Reference Value' : [4.8, 2.4, 3.6, 0.6, 4.8, 5.4], 'True Result' : [8, 4, 6, 1, 8, 9]})
x = 1.5
df["Predicted Result"] = df['Reference Value'] * x
df["Error Squared"] = np.square(df["True Result"] - df["Predicted Result"])
Which if printed, looks as follows:
Reference Value True Result Predicted Result Error Squared
0 4.8 8 7.2 0.64
1 2.4 4 3.6 0.16
2 3.6 6 5.4 0.36
3 0.6 1 0.9 0.01
4 4.8 8 7.2 0.64
5 5.4 9 8.1 0.81
The total squared error is:
print("Total Error Squared: " + str(np.sum(df["Error Squared"])))
>> Total Error Squared: 2.6199999999999997
I am trying to change x such that the total error squared in the table is minimized. Ideally, after minimization, the table should look something like this:
Reference Value True Result Predicted Result Error Squared
0 4.8 8 8.0 0.0
1 2.4 4 4.0 0.0
2 3.6 6 6.0 0.0
3 0.6 1 1.0 0.0
4 4.8 8 8.0 0.0
5 5.4 9 9.0 0.0
with x being set to 1.6666
How can I achieve this through scipy or similar? Thanks
Upvotes: 0
Views: 141
Reputation: 7157
You can use scipy.optimize.minimize
:
from scipy.optimize import minimize
ref_vals = df["Reference Value"].values
true_vals = df["True Result"].values
def obj(x):
return np.sum((true_vals - ref_vals * x)**2)
res = minimize(obj, x0=[1.0])
where res.x
contains the solution 1.66666666
.
Upvotes: 1