Reputation: 858
Lets say I take multiple measurements of some dependent variable y
relative to some independent variable x
. I also record the uncertainty in each measurement dy
. As an example this may look like
import numpy as np
x = np.array([1, 2, 3, 4])
y = np.array([4.1, 5.8, 8.1, 9.7])
dy = np.array([0.2, 0.3, 0.2, 0.4])
Now assume I expect the measured values to obey a linear relationship y = mx + b
and I want to determine the y value y_umn
for some unmeasured x value x_unm
. I can perform a linear fit in Python pretty easily if I don't consider the error:
fit_params, residuals, rank, s_values, rcond = np.polyfit(x, y, 1, full=True)
poly_func = np.poly1d(fit_params)
x_unm # The unmeasured x value
y_unm = poly_func(x_unm) # The unmeasured x value
I have two problems with this approach. First is that np.polyfit
does not incorporate the error on each point. Second is that I have no idea what the uncertainty on y_unm
is.
Does anyone know how to fit data with uncertainties in a way that would allow me to determine the uncertainty in y_unm
?
Upvotes: 5
Views: 3365
Reputation: 466
This is a problem that can be done analytically, but that is perhaps better suited as a math/statistics discussion. For example see (among many sources):
https://www.che.udel.edu/pdf/FittingData.pdf
The error in the fit can be calculated analytically. It is important to note though that the fit itself is different when accounting for errors in the measurements.
In python I am not sure of a built in function that handles errors but here is an example of doing a chi-squared minimization using scipy.optimize.fmin
#Calculate Chi^2 function to minimize
def chi_2(params,x,y,sigy):
m,c=params
return sum(((y-m*x-c)/sigy)**2)
data_in=(x,y,dy)
params0=[1,0]
q=fmin(chi_2,params0,args=data_in)
For comparison I used this, your polyfit solution, and the analytic solution and plotted for the data you gave.
The results for the parameters from the given techniques:
Weighted Chi-squared with fmin: m=1.94609996 b=2.1312239
Analytic: m=1.94609929078014 b=2.1312056737588647
Polyfit: m=1.91 b=2.15
Here is the full code:
import numpy as np
from scipy.optimize import fmin
import matplotlib.pyplot as plt
x = np.array([1, 2, 3, 4])
y = np.array([4.1, 5.8, 8.1, 9.7])
dy = np.array([0.2, 0.3, 0.2, 0.4])
#Calculate Chi^2 function to minimize
def chi_2(params,x,y,sigy):
m,c=params
return sum(((y-m*x-c)/sigy)**2)
data_in=(x,y,dy)
params0=[1,0]
q=fmin(chi_2,params0,args=data_in)
#Unweighted fit to compare
a=np.polyfit(x,y,deg=1)
#Analytic solution
sx=sum(x/dy**2)
sx2=sum(x**2/dy**2)
s1=sum(1./dy**2)
sy=sum(y/dy**2)
sxy=sum(x*y/dy**2)
ma=(s1*sxy-sx*sy)/(s1*sx2-sx**2)
ba=(sx2*sy-sx*sxy)/(sx2*s1-sx**2)
xplt=np.linspace(0,5,100)
yplt1=xplt*q[0]+q[1]
yplt2=xplt*a[0]+a[1]
yplt3=xplt*ma+ba
plt.figure()
plt.plot(xplt,yplt1,label='Error Weighted',color='black')
plt.plot(xplt,yplt2,label='Non-Error Weighted',color='blue')
plt.plot(xplt,yplt3,label='Error Weighted Analytic',linestyle='--',color='red')
plt.errorbar(x,y,yerr=dy,fmt='ko')
plt.legend()
plt.show()
Upvotes: 1