Ruan
Ruan

Reputation: 189

Minimize total error squared column of table by changing a variable (Python)

Consider a table that is created using the following code:

import pandas as pd
import numpy as np    

df = pd.DataFrame({'Reference Value' : [4.8, 2.4, 3.6, 0.6, 4.8, 5.4], 'True Result' : [8, 4, 6, 1, 8, 9]})
x = 1.5
df["Predicted Result"] = df['Reference Value'] * x
df["Error Squared"] = np.square(df["True Result"] - df["Predicted Result"])

Which if printed, looks as follows:

   Reference Value  True Result  Predicted Result  Error Squared
0              4.8            8               7.2           0.64
1              2.4            4               3.6           0.16
2              3.6            6               5.4           0.36
3              0.6            1               0.9           0.01
4              4.8            8               7.2           0.64
5              5.4            9               8.1           0.81

The total squared error is:

print("Total Error Squared: " + str(np.sum(df["Error Squared"])))
>> Total Error Squared: 2.6199999999999997

I am trying to change x such that the total error squared in the table is minimized. Ideally, after minimization, the table should look something like this:

   Reference Value  True Result  Predicted Result  Error Squared
0              4.8            8               8.0            0.0
1              2.4            4               4.0            0.0
2              3.6            6               6.0            0.0
3              0.6            1               1.0            0.0
4              4.8            8               8.0            0.0
5              5.4            9               9.0            0.0

with x being set to 1.6666

How can I achieve this through scipy or similar? Thanks

Upvotes: 0

Views: 141

Answers (1)

joni
joni

Reputation: 7157

You can use scipy.optimize.minimize:

from scipy.optimize import minimize

ref_vals = df["Reference Value"].values
true_vals = df["True Result"].values

def obj(x):
    return np.sum((true_vals - ref_vals * x)**2)

res = minimize(obj, x0=[1.0])

where res.x contains the solution 1.66666666.

Upvotes: 1

Related Questions