Reputation: 23
I'm trying to run a non linear regression on this data:
Flux<-c(192.09536, 199.47616, 137.63245, 133.60358, -89.28360, -23.17639, -27.14659, 107.25287, 52.72565, NA, 167.43277, 113.59047)
Par<-c(4.166667e-01, 4.347826e-02, 4.583333e-01, 1.845833e+02, 1.122688e+03, 1.059048e+03, 6.384000e+02, 3.326087e+02, 7.094762e+02, 4.180000e+02, 3.953333e+02, 3.998636e+02)
Obs<-c(1,2,3,4,5,6,7,8,9,10,11,12)
curve1<-data.frame(Flux, Par, Obs)
curve1<-do.call("cbind", curve1)
This is the first model I tried, which has worked on some other similar datasets:
model1 <- nls(Flux~b*Par/(c+Par)-a, data = curve1, start=list(a=180,
b=-200, c=800))
However for this data model1 gives:
Error in nls(Flux ~ b * Par/(c + Par) - a, data = curve1, start =
list(a = 180, : singular gradient
I thought this might be because my starting parameters are wrong so I tried to turn it into a self starting model (I've also tried lots of different starting parameters):
model2<-with(curve1, nls(Flux~SSasymp(Par, a, b, c)))
This gives the same error. However I think I have used SSasymp wrong in this case because it fits incorrect curves to data which I am able to fit model1 to. I think this is because I've confused R about a, b and c (?). I've read that when using SSasymp: b is 'the horizontal asymptote (a) -the response when x is 0' while c is the rate constant.
In my original equation in model1 b is the horizontal asymptote, c is the rate constant and a is the response when x is 0.
If I try to write a self starting model to reflect this:
model3<-with(curve1, nls(Flux~SSasymp(Par, b, (b-a), c)))
I get this error: In addition: Warning message: In nls(Flux ~ SSasymp(Par, b, (b - a), c)) : No starting values specified for some parameters. Initializing ‘a’ to '1.'. Consider specifying 'start' or using a selfStart model
I am looking for advice on 1) Is model1 not working because of an error in my code/incorrect starting parameters or because the model just doesn't fit the data?
If the latter is the case, is there a way to force R to do its best to fit a non-linear model to it anyway? Ecologically, this should really be a saturating curve.
2) Can I/How do I fit my equation into a self-starting model? Have I just fundamentally misunderstood how to use SSasymp?
Any help is very much appreciated. Apologies if I haven't explained or formatted this very well, this is my first post and I'm not an experienced R user or statistician!
Upvotes: 2
Views: 4770
Reputation: 59335
Something like this?
model1<-nls(Flux~b*Par/(c+Par)-a, data = curve1, start=list(a=180, b=-200, c=-2000))
plot(Flux~Par,curve1)
curve(predict(model1,newdata=data.frame(Par=x)),add=TRUE)
summary(model1)
# Formula: Flux ~ b * Par/(c + Par) - a
#
# Parameters:
# Estimate Std. Error t value Pr(>|t|)
# a -179.17 22.86 -7.837 5.06e-05 ***
# b 1009.36 2556.44 0.395 0.703
# c -5651.20 11542.41 -0.490 0.638
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 42.43 on 8 degrees of freedom
# ...
Your data is somewhat pathological. Functions of the form
y = b * x / (c+x)
are concave up when b < 0
and c > 0
; they are concave down when b > 0
and c < 0
, providing |c| > max(x)
(otherwise there's a vertical asymptote, as demonstrated in one of the comments). Because your data is "nearly" linear, and there is substantial scatter, the best fit (e.g., the set of parameters a, b, and c which minimize the residual sum-of-squares), is concave down (c < 0
). So if you start with an estimate of c < -max(x)
, you get convergence.
Now, I gather from your question that c
has some physical meaning which requires it to be > 0. The problem here is that your model is over-specified (too many parameters). In a saturating curve, the rate constant is determined from the curvature. But in your case, there is no curvature (or, if any, it is negative), so you cannot determine a rate constant. Mathematically, for x << c
b * x / (c + x) ~ (b/c) * x
in your case the slope is about -0.25, so b/c ~ -0.25
. But there are infinitely many values of b
and c
which yield this ratio. So, while you know a lot about the ratio b/c
, you know nothing at all about b
or c
individually. This is why the standard error in those parameters is so large in the fit above (and the p-values are so high).
The bottom line is that, in this specific case, you have insufficient data to determine a, b, and c separately, with any accuracy.
[Two minor points]
NA
s in your data has nothing to do with this -
nls(...)
removes rows containing NA
s by default.curve1<-do.call("cbind", curve1)
Upvotes: 1