Reputation: 67
I have the following dataframe:
x y error_on_y
1 1.2 0.1
2 0.87 0.23
4 1.12 0.11
5 0.75 0.06
5 0.66 0.15
6 0.98 0.08
7 1.34 0.05
7 2.86 0.12
With this frame I want to use a np.polyfit to fit the regression line. I've fitted the line using:
x = np.array(dataframe['x'])
y = np.array(dataframe['y'])
y_err = np.array(dataframe['error_on_y'])
np.polyfit(x,y,deg=1, w=1/y_err, cov=True)
However I can't find how to plot this fit with my errors defined in python. So far the only examples of plotting with np.polyfit I found were with fits that did not involve a specified weight.
Does anyone know how I would be able to plot this line? Or does anyone know a link to a descent example? I for one have not been able to find one and I have been looking for quite a while now so any expertise on the matter would be very welcome and much appreciated!
EDIT/Clarification:
when the weight(w) is not defined in the function the polyfit function will return a single vector with the coefficients that minimise the squared error. However when the w is defined another vector is also added:
np.polyfit(x,y,deg=1, w=1/y_err, cov=True)
output:
(array([0.00097481, 0.82290694]), array([[ 4.75261249e-09, -2.28408710e-07],
[-2.28408710e-07, 1.41696109e-05]]))
EDIT/additional info:
after finding this link(https://peteris.rocks/blog/extrapolate-lines-with-numpy-polyfit/) I found that with a non defined weight the polyfit function only returns the first array. i.e.
vector = array([0.00097481, 0.82290694])
in the line function y = mx + b then
m =vector[0] and b = vector[1]
. aka m = slope and b = the intercept. This means the additional vector in the example above must be the result of the defined weight in the function.
I am trying to find how I should interpretate/plot this with the weights included :)
POSSIBLE ANSWER: I've found the following:
import numpy as np
new = np.polyfit(x,y,deg=1, w=1/y_err, cov=True)
m, b = new[0]
a,c = new[1][0]
d,e = new[1][1]
m, b, a,c, d,e
for i in range(min(x), max(x)):
plt.plot(i, i * m + b, 'go')
plt.plot(i, i * (m+a) + (b+c), 'bo')
plt.plot(i, i * (m-d) + (b-e), 'ro')
plt.show()
In this example I assumed that the first array/vector that is given in
(array([0.00097481, 0.82290694]), array([[ 4.75261249e-09, -2.28408710e-07],
[-2.28408710e-07, 1.41696109e-05]]))
Are the coefficients for the fit regression line. The following 2 arrays would be the error on the regression line. This is what I tried and I believe makes sense. It is not definitive though so I'll leave the post open for comments and remarks/ better solutions.
Upvotes: 3
Views: 2363
Reputation: 6091
When you do np.polyfit(x,y,deg=1, w=1/y_err, cov=True)
you're calculating (among other things) the coefficients of a polynomial. To easily manipulate such coefficients you can create a polynomial object
p, mycov = np.poly1d(np.polyfit(x,y,deg=1, w=1/y_err, cov=True))
and plot it using
x_plot = np.linspace(1, 7, 100)
plt.plot(x_plot, p(x_plot))
Upvotes: 1