Error propagation in a linear fit using python

Question

Lets say I take multiple measurements of some dependent variable y relative to some independent variable x. I also record the uncertainty in each measurement dy. As an example this may look like

import numpy as np
x = np.array([1, 2, 3, 4])
y = np.array([4.1, 5.8, 8.1, 9.7])
dy = np.array([0.2, 0.3, 0.2, 0.4])

Now assume I expect the measured values to obey a linear relationship y = mx + b and I want to determine the y value y_umn for some unmeasured x value x_unm. I can perform a linear fit in Python pretty easily if I don't consider the error:

fit_params, residuals, rank, s_values, rcond = np.polyfit(x, y, 1, full=True)
poly_func = np.poly1d(fit_params)

x_unm   # The unmeasured x value
y_unm = poly_func(x_unm)  # The unmeasured x value

I have two problems with this approach. First is that np.polyfit does not incorporate the error on each point. Second is that I have no idea what the uncertainty on y_unm is.

Does anyone know how to fit data with uncertainties in a way that would allow me to determine the uncertainty in y_unm?

JJR4 · Accepted Answer

This is a problem that can be done analytically, but that is perhaps better suited as a math/statistics discussion. For example see (among many sources):

https://www.che.udel.edu/pdf/FittingData.pdf

The error in the fit can be calculated analytically. It is important to note though that the fit itself is different when accounting for errors in the measurements.

In python I am not sure of a built in function that handles errors but here is an example of doing a chi-squared minimization using scipy.optimize.fmin

#Calculate Chi^2 function to minimize
def chi_2(params,x,y,sigy):
    m,c=params
    return sum(((y-m*x-c)/sigy)**2)

data_in=(x,y,dy)
params0=[1,0]

q=fmin(chi_2,params0,args=data_in)

For comparison I used this, your polyfit solution, and the analytic solution and plotted for the data you gave.

The results for the parameters from the given techniques:

Weighted Chi-squared with fmin: m=1.94609996 b=2.1312239

Analytic: m=1.94609929078014 b=2.1312056737588647

Polyfit: m=1.91 b=2.15

Linear fits to given data

Here is the full code:

import numpy as np
from scipy.optimize import fmin
import matplotlib.pyplot as plt

x = np.array([1, 2, 3, 4])
y = np.array([4.1, 5.8, 8.1, 9.7])
dy = np.array([0.2, 0.3, 0.2, 0.4]) 

#Calculate Chi^2 function to minimize
def chi_2(params,x,y,sigy):
    m,c=params
    return sum(((y-m*x-c)/sigy)**2)

data_in=(x,y,dy)
params0=[1,0]

q=fmin(chi_2,params0,args=data_in)

#Unweighted fit to compare

a=np.polyfit(x,y,deg=1)

#Analytic solution
sx=sum(x/dy**2)
sx2=sum(x**2/dy**2)
s1=sum(1./dy**2)
sy=sum(y/dy**2)
sxy=sum(x*y/dy**2)

ma=(s1*sxy-sx*sy)/(s1*sx2-sx**2)
ba=(sx2*sy-sx*sxy)/(sx2*s1-sx**2)

xplt=np.linspace(0,5,100)
yplt1=xplt*q[0]+q[1]


yplt2=xplt*a[0]+a[1]

yplt3=xplt*ma+ba

plt.figure()
plt.plot(xplt,yplt1,label='Error Weighted',color='black')
plt.plot(xplt,yplt2,label='Non-Error Weighted',color='blue')
plt.plot(xplt,yplt3,label='Error Weighted Analytic',linestyle='--',color='red')
plt.errorbar(x,y,yerr=dy,fmt='ko')
plt.legend()
plt.show()

Error propagation in a linear fit using python

Answers (1)

Related Questions