Jan
Jan

Reputation: 97

Gnuplot Curve Fitting With Time-Offset

I have an issue with curve fitting process using Gnuplot. I have data with the time starting at 0.5024. I want to use a linear sin/cos combo to fit a value M over time (M=a+bsin(wt)+ccos(wt)). For further processing I only need the c value. My code is

f(x)=a+b*sin(w*x)+c*cos(w*x)
fit f(x) "data.dat" using 1:2 via a,b,c,w

the asymptotic standard error ist 66% for parameter c which seems quite high. I suspect that it has to do with the fact, that the time starts at 0.5024 instead of 0. What I could do of course is

fit f(x) "data.dat" using ($1-0.5024):2 via a,b,c,w

with an asymptotic error of about 10% which is way lower. The question is: Can I do that? Does my new plot with the time offset still represent the original curve? Any other ideas?

Thanks in advance for your help :-)

Upvotes: 0

Views: 649

Answers (2)

sweber
sweber

Reputation: 2976

It's a bit difficult to answer this without having seen your data, but your observation is typical.

The problem is an effect of the fit itself, or even your formula. Let me explain it using an example data set. (Well, this will become offtopic...)

An statistics excourse

The data follows the function f(x)=x and all y-values have been shifted by gassian random numbers. In addtion, the data is in the x-dange [600:800].

You can now simply apply a linear fit f(x)=m*x+b. According to Gauß' error distribution, the error is df(x)=sqrt((dm*x)²+(db)²). So, you can plot the data, the linear function and the error margin f(x) +/- df(x)

Here is the result:

enter image description here

The parameters:

m = 0.981822 +/- 0.1212 (12.34%)
b = 0.974375 +/- 85.13  (8737%)

The correlation matrix:

               m      b      
m               1.000 
b              -0.997  1.000 

You may notice three things:

  1. The error for b is very large!
  2. The error margin is small at x=0, but increases with x. Shouldn't it be smallest where the data is, i.e. at x=700?
  3. The correlation between m and b is -0.997, which is near the maximum (absolute) value of 1.

The third point can be understood at the plot: If you increase the slope m, the y-offset decreases, too. Both parameters are very correlated, and an error on one of them is distributed to the other!

From statistics you may know, that a linear regression function always goes through the center of gravity (cog) of your data. So, let's shift the data so that the cog is the origin (it's enough to shift it so that the cog is on the y-axis, but I did it so)

enter image description here

Result:

m = 1.0465   +/- 0.1211 (11.57%)
b = -12.0611 +/- 7.027  (58.26%)

Correlation:

           m      b 
m          1.000 
b         -0.000  1.000 

Compared to the first plot, the value and error for m is almost the same, but the very large error ob b is much smaller now. The reason is that m and b are not correlated any more, and so a (tiny) variation m does not give a (very big) variation of b. It is also nice to see that the error margin has shrunk a lot.

Here is a last plot with the original data, the first fit function and the "back-shifted function for the shifted data":

enter image description here

About your fit function:

First, there is a big correlation problem: b and c are extremely correlated, as both together define the phase and amplitude of your oscillation. It would help a lot to use another, equivalent function:

f(x)=a+N*sin(w*x+p)

Here, you have phase and amplitude separated. You can still calculate your c from the fit results, and I guess, the error is much better for it.

Like in my example, if the data is far away from the y-axis, a small variation of w will have a big impact on p . So, I would suggest to shift your data so that it's cog is on the y-axis to get almost rid of this.

Is this shift allowed?

Yes. You do not alter the data, you simply change your coordinate system to get better errors. Also, the fit function should describe the data, so it should be very accurate in the range where your data is. In my first plot, the highest accuracy is at the y-axis, not where the data is.

Important

You should always remark which tricks you applied. Otherwise, someome may check your results and fit the data without the tricks, sees the red curve instead youre green one, and may accuse you of cheating...

Upvotes: 1

Miguel
Miguel

Reputation: 7627

Whether you can do that or not depends on whether the curve you're fitting to represents the physical phenomena you're studying and is consistent with the physical model you need to comply with. My suggestion is that you provide those and ask this question again in a physics forum (or chemistry, biology, etc., depending on your field).

Upvotes: 0

Related Questions