Chammu
Chammu

Reputation: 43

error in executing custom fit with nls function

I am trying to fit data with custom equation and i am getting warning msg and it is not getting executed.. and this is my r-code..

x <- c(3, 33, 146, 227, 342, 351, 353, 444, 556, 571, 709, 759, 836, 
860, 968, 1056, 1726, 1846, 1872, 1986, 2311, 2366, 2608, 2676, 
3098, 3278, 3288, 4434, 5034, 5049, 5085, 5089, 5089, 5097, 5324, 
5389, 5565, 5623, 6080, 6380, 6477, 6740, 7192, 7447, 7644, 7837, 
7843, 7922, 8738, 10089, 10237, 10258, 10491, 10625, 10982, 11175, 
11411, 11442, 11811, 12559, 12559, 12791, 13121, 13486, 14708, 
15251, 15261, 15277, 15806, 16185, 16229, 16358, 17168, 17458, 
17758, 18287, 18568, 18728, 19556, 20567, 21012, 21308, 23063, 
24127, 25910, 26770, 27753, 28460, 28493, 29361, 30085, 32408, 
35338, 36799, 37642, 37654, 37915, 39715, 40580, 42015, 42045, 
42188, 42296, 42296, 45406, 46653, 47596, 48296, 49171, 49416, 
50145, 52042, 52489, 52875, 53321, 53443, 54433, 55381, 56463, 
56485, 56560, 57042, 62551, 62651, 62661, 63732, 64103, 64893, 
71043, 74364, 75409, 76057, 81542, 82702, 84566, 88682)

y <- c(1:136)
df <- data.frame(x,y)

fit <- nls(y ~ a*(1-exp(-x/b))^c, data=df, start = list( a=100,b=1000,c=0.5),  
     algorithm="port",lower=list(a=100,b=100,c=0.5),upper=list(a=200,b=10000,c=2))

Warning messages:

1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf

How to fit this data to this custom equation and how to find r-square value..

Thanks in advance..

Upvotes: 0

Views: 333

Answers (1)

jlhoward
jlhoward

Reputation: 59425

First, to answer your question directly, R2 is not a meaningful concept for non-linear fits. For linear models, R2 is the fraction of total variability in the dataset which is explained by the model. This calculation is only valid if SST = SSR + SSE, which is true for all linear models, but is not necessarily true for non-linear models. See this question for a more complete explanation and some additional references.

So, while one can access R2 for a linear model as

rsq <- summary(lm(...))$r.squared

the summary(...) method for non-linear models does not return an R2 value.

Second, you really need to get into the habit of plotting your fitted curve against your data.

plot(x,y)
lines(x,predict(fit))

There's just no way to interpret this as a "good" fit. If we re-run the model without the constraints on a, b, and c we got a much better fit:

fit.2 <- nls(y ~ a*(1-exp(-x/b))^c, data=df, start = list( a=100,b=1000,c=0.5),  
           algorithm="port")

par(mfrow=c(1,2))
plot(x,y, main="Constrained Model",cex=0.5)
lines(x,predict(fit), col="red")
plot(x,y, main="UNconstrained Model",cex=0.5)
lines(x,predict(fit.2), col="red")

Clearly this model is "better", but that doesn't mean it's "good". Among other things, we need to look at the residuals. For a well-fit model, the residuals should not depend on x. So let's see:

plot(y,residuals(fit),main="Residuals: 1st Model")
plot(y,residuals(fit.2),main="Residuals: 2ndd Model")

The residuals in the second model are much smaller and do not follow a trend, although there does seem to be some underlying structure. This would be something to look at - it implies that there might be some low amplitude oscillation either due to a real effect, or perhaps due to the method of data collection.

Also, the underlying principle of regression modeling (the starting assumption) is that the residuals are normally distributed with constant variance (e.g., the variance does not depend on x). We can check this with a Q-Q plot, which plots quantiles of your (standardized) residuals against quantiles of N(0,1). If the residuals are normally distributed, this should be a straight line.

se <- summary(fit)$sigma
qqnorm(residuals(fit)/se, main="Q-Q Plot, 1st Model")
qqline(residuals(fit)/se,probs=c(0.25,0.75))
se.2 <- summary(fit.2)$sigma
qqnorm(residuals(fit.2)/se.2, main="Q-Q Plot, 2nd Model")
qqline(residuals(fit.2)/se.2,probs=c(0.25,0.75))

As you can see, the residuals from the 2nd model are very nearly normally distributed. IMO this, more than anything else, suggests that the second model is a "good" model.

Finally, your data looks to me like the cdf of a distribution with 2 peaks. One very simple way to model data like that is as:

y ~ a1 × ( 1 - exp( -x/b1) ) + a2 × ( 1 - exp( -x/b2) )

When I try this model it is a little better than the unconstrained version of your model.

Upvotes: 1

Related Questions