Reputation: 5129
I want to fit a function in the increase form of exponential decay (or asymptotic curve), such that:
Richness = C*(1-exp(k*Abundance)) # k < 0
I've read on this page about expn()
function, but simply can't find it (or a nls
package). All I found was a nlstools package, but it has no expn()
. I tried with the usual nls
and exp
function, but I only get increasing exponentials...
I want to fit the graph like below (drawn in Paint), and I don't know where the curve should stabilize (Richness = C). Thanks in advance.
Upvotes: 3
Views: 6924
Reputation: 2719
The increasing form of exponential decay function looks like this:
y = a + b*(1-e^(-x * k))
You can manually change the coefficients (a, b, k) to understand how they affect the resulting curves:
library(tidyverse)
# increase exponent factor (k) -> increase initial growth rate
data.frame(x= seq(0,250,.5)) %>%
mutate(y0 = 0 + 60*(1-exp(-x*.00)),
y1 = 0 + 60*(1-exp(-x*.01)),
y2 = 0 + 60*(1-exp(-x*.02)),
y3 = 0 + 60*(1-exp(-x*.03))) %>%
ggplot() +
geom_abline() +
geom_point(aes(y=y0, x=x, color='k=.00')) +
geom_point(aes(y=y1, x=x, color='k=.01')) +
geom_point(aes(y=y2, x=x, color='k=.02')) +
geom_point(aes(y=y3, x=x, color='k=.03')) +
scale_color_manual(values = c("black","blue","red","yellow"))
# increase multiplication factor (b) -> set maximum y value
data.frame(x= seq(0,250,.5)) %>%
mutate(y0 = 0 + 20*(1-exp(-x*.03)),
y1 = 0 + 40*(1-exp(-x*.03)),
y2 = 0 + 60*(1-exp(-x*.03)),
y3 = 0 + 80*(1-exp(-x*.03))) %>%
ggplot() +
geom_abline() +
geom_point(aes(y=y0, x=x, color='b=20')) +
geom_point(aes(y=y1, x=x, color='b=40')) +
geom_point(aes(y=y2, x=x, color='b=60')) +
geom_point(aes(y=y3, x=x, color='b=80')) +
scale_color_manual(values = c("black","blue","red","yellow"))
# increase additive (offset) factor (a) -> shift the curve up/down
data.frame(x= seq(0,250,.5)) %>%
mutate(y0 = -10 + 60*(1-exp(-x*.03)),
y1 = 0 + 60*(1-exp(-x*.03)),
y2 = 10 + 60*(1-exp(-x*.03)),
y3 = 20 + 60*(1-exp(-x*.03))) %>%
ggplot() +
geom_abline() +
geom_point(aes(y=y0, x=x, color='a=-10')) +
geom_point(aes(y=y1, x=x, color='a=0')) +
geom_point(aes(y=y2, x=x, color='a=10')) +
geom_point(aes(y=y3, x=x, color='a=20')) +
scale_color_manual(values = c("black","blue","red","yellow"))
First, create a dataset that follows some arbitrary exponential distribution with some error in it:
set.seed(1);
df = data.frame(x = seq(0,250,.5) * runif(n=501)) %>%
mutate(y = (-2.5 + 75*(1-exp(-x*.05))) + rnorm(n=501,mean=0,sd=3))
I like to do some data visualization first to get a sense of what function(s) might work well. You can fit various linear and smoothed curves to the data using ggplot geom_smooth()
as follows:
ggplot(df, aes(x=x,y=y)) +
geom_abline() +
geom_point() +
geom_smooth(se=F, aes(color="loess")) +
geom_smooth(se=F, method="lm", aes(color="linear")) +
geom_smooth(se=F, method="lm", formula = y ~ poly(x,2), aes(color="2nd order polynomial")) +
geom_smooth(se=F, method="lm", formula = y ~ poly(x,3), aes(color="3rd order polynomial")) +
geom_smooth(se=F, method = "nls", formula = y ~ 1+MaxVal*(1-exp(-x*k)),
method.args = list(start=c(MaxVal=80, k=.25)),
aes(color="exponential no additive")) +
geom_smooth(se=F, method = "nls", formula = y ~ a+MaxVal*(1-exp(-x*k)),
method.args = list(start=c(a=1, MaxVal=80, k=.25)),
aes(color="exponential")) +
geom_smooth(se=F, formula = y ~ log(x+1), aes(color="logarithmic")) +
scale_color_manual(values = c("orange", "blue", "green", "red", "yellow", "pink", "purple"))
I've fit:
method = 'loess' and formula = 'y ~ x'
),formula = 'y ~ x'
),log(0) = -Inf
errors.As you can see, the linear and both polynomial curves fit quite poorly, whereas the loess, exponentials, and logarithmic seem pretty good at first glance.
Potential drawbacks of the loess curve are: (1) it is non-parametric (no equation for it), (2) it may be overfitting (there is more 'wiggle' in it), (3) it is not strictly increasing.
Potential drawback of the logarithmic curve: (1) it is not strictly increasing - at higher values of x it begins to decline.
You can see this more clearly by adding another point at a high x value (without changing the source dataset):
df %>%
bind_rows(data.frame(x=300, y=75)) %>%
ggplot(., aes(x=x,y=y)) +
geom_abline() +
geom_point() +
geom_smooth(se=F, aes(color="loess")) +
geom_smooth(se=F, method = "nls", formula = y ~ 1+MaxVal*(1-exp(-x*k)),
method.args = list(start=c(MaxVal=80, k=.25)),
aes(color="exponential no additive")) +
geom_smooth(se=F, method = "nls", formula = y ~ v+MaxVal*(1-exp(-x*k)),
method.args = list(start=c(MaxVal=80, k=.25, v=1)),
aes(color="exponential")) +
geom_smooth(se=F, formula = y ~ log(x+1), aes(color="logarithmic")) +
scale_color_manual(values = c("green", "red", "pink", "purple"))
Visualization is great and showed us that we probably want one of the two exponential functions, but now we need get those equations.
The usual ways are not going to work here:
# plugging the formula into lm() won't work because
# you have unknown, undefined coefficients (b, k):
lm(formula = as.formula("y ~ 1 + b*(1-exp(-(x*k)))"), method="lm", data = df)
# Error in eval(predvars, data, env) : object 'b' not found
# even if you arbitrarily plug in some values for those coefficients, it doesn't work:
lm(formula = as.formula("y ~ 1 + 60*(1-exp(-(x*.05)))"), method="lm", data = df)
# Error in terms.formula(formula, data = data) :
# invalid model formula in ExtractVars
You want a method that solves for the optimal coefficients. You can use the nls()
function, however, it may not work if you don't select some reasonable starting values for each of the parameters (it automatically initializes each of them to 1, which turns out to be pretty bad for this use case):
fit = nls(formula = as.formula("y ~ 1 + b*(1-exp(-(x*k)))"), data = df)
# Warning in nls(formula = as.formula("y ~ 1 + b*(1-exp(-(x*k)))"), data = df) :
# No starting values specified for some parameters.
# Initializing ‘b’, ‘k’ to '1.'.
# Consider specifying 'start' or using a selfStart model
# Error in numericDeriv(form[[3L]], names(ind), env, central = nDcentral) :
# Missing value or an infinity produced when evaluating the model
Reasonably close starting values will do just fine:
# without additive term (offset):
fit = nls(formula = as.formula("y ~ 1 + b*(1-exp(-(x*k)))"),
start = c(b=90, k=.05), data = df)
summary(fit)
Formula: y ~ 1 + b * (1 - exp(-(x * k)))
Parameters:
Estimate Std. Error t value Pr(>|t|)
b 71.881315 0.237978 302.1 <2e-16 ***
k 0.046566 0.000532 87.5 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.11 on 499 degrees of freedom
Number of iterations to convergence: 4
Achieved convergence tolerance: 0.00000256
Because this model used a fixed offset (a = 1) the b and k coefficients weren't as close as they could be to our original values (b = 71.9 vs. 75, and k=.046 vs. 0.05).
Adding an additional parameter for the offset coefficient gets us closer:
# with offset term:
fit = nls(formula = as.formula("y ~ a + b*(1-exp(-(x*k)))"),
start = c(a=-15, b=60, k=.05), data = df)
summary(fit)
Formula: y ~ a + b * (1 - exp(-(x * k)))
Parameters:
Estimate Std. Error t value Pr(>|t|)
a -2.830697 0.517438 -5.47 0.000000071 ***
b 75.317600 0.512742 146.89 < 2e-16 ***
k 0.050070 0.000709 70.61 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.95 on 498 degrees of freedom
Number of iterations to convergence: 3
Achieved convergence tolerance: 0.000000179
As you can see, the model with an offset term found coefficients quite close to our original values (a = -2.8 vs -2.5, b = 75.32 vs. 75, and k=.05 vs. 0.05), with residual standard error (RSE) basically equivalent to the sd=3
noise that I added to the arbitrary dataset.
Upvotes: 1
Reputation: 226182
R has self-starting non-linear fit functions (see apropos("^SS[a-z]")
for a list, and look at the corresponding help pages for details). These use heuristic algorithms to estimate reasonable starting values for specific functional forms. SSasympOff
(asymptotic + offset) is the appropriate one for this case.
set.seed(1) # so the example is reproduceable
df <- data.frame(Abundance=sort(sample(1:70,30)))
df$Richness <- with(df, 20*(1-exp(-0.03*Abundance))+rnorm(30))
fit <- nls(Richness ~ SSasympOff(Abundance, Asym, lrc, c0), data = df)
Results:
Nonlinear regression model
model: Richness ~ SSasympOff(Abundance, Asym, lrc, c0)
data: df
Asym lrc c0
19.3613 -3.4294 0.2421
residual sum-of-squares: 22.04
Number of iterations to convergence: 0
Achieved convergence tolerance: 5.97e-07
df$pred <- predict(fit)
plot(df$Abundance,df$Richness)
lines(df$Abundance,df$pred, col="blue",lty=2)
Upvotes: 2
Reputation: 5129
Thanks, jlhoward. I've got to something similar after reading the link sent by shujaa.
R <- function(a, b, abT) a*(1 - exp(-b*abT))
form <- Richness ~ R(a,b,Abundance)
fit <- nls(form, data=d, start=list(a=20,b=0.01))
plot(d$Abundance,d$Richness, xlab="Abundance", ylab="Richness")
lines(d$Abundance, predict(fit,list(x=d$Abundance)))
I've found the initial values by trial and error, though. So your solution looks better :)
EDIT: The result:
Upvotes: 0
Reputation: 59355
This should get you started. Read the documentation on nls(...)
(type ?nls
at the command prompt). Also look up ?summary
and ?predict
.
set.seed(1) # so the example is reproduceable
df <- data.frame(Abundance=sort(sample(1:70,30)))
df$Richness <- with(df, 20*(1-exp(-0.03*Abundance))+rnorm(30))
fit <- nls(Richness ~ C*(1-exp(k*Abundance)),data=df,
algorithm="port",
start=c(C=10,k=-1),lower=c(C=0,k=-Inf), upper=c(C=Inf,k=0))
summary(fit)
# Formula: Richness ~ C * (1 - exp(k * Abundance))
#
# Parameters:
# Estimate Std. Error t value Pr(>|t|)
# C 20.004173 0.726344 27.54 < 2e-16 ***
# k -0.030183 0.002334 -12.93 2.5e-13 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.7942 on 28 degrees of freedom
#
# Algorithm "port", convergence message: relative convergence (4)
df$pred <- predict(fit)
plot(df$Abundance,df$Richness)
lines(df$Abundance,df$pred, col="blue",lty=2)
Upvotes: 2