Rodrigo
Rodrigo

Reputation: 5129

fit exponential decay in increase form in R

I want to fit a function in the increase form of exponential decay (or asymptotic curve), such that:

Richness = C*(1-exp(k*Abundance))  # k < 0

I've read on this page about expn() function, but simply can't find it (or a nls package). All I found was a nlstools package, but it has no expn(). I tried with the usual nls and exp function, but I only get increasing exponentials...

I want to fit the graph like below (drawn in Paint), and I don't know where the curve should stabilize (Richness = C). Thanks in advance.

asymptotic curve

Upvotes: 3

Views: 6924

Answers (4)

Brian D
Brian D

Reputation: 2719

The increasing form of exponential decay function looks like this: y = a + b*(1-e^(-x * k))

You can manually change the coefficients (a, b, k) to understand how they affect the resulting curves:

library(tidyverse)

# increase exponent factor (k) -> increase initial growth rate
data.frame(x= seq(0,250,.5)) %>%
  mutate(y0 = 0 + 60*(1-exp(-x*.00)),
         y1 = 0 + 60*(1-exp(-x*.01)),
         y2 = 0 + 60*(1-exp(-x*.02)),
         y3 = 0 + 60*(1-exp(-x*.03))) %>%
  ggplot() + 
  geom_abline() + 
  geom_point(aes(y=y0, x=x, color='k=.00')) + 
  geom_point(aes(y=y1, x=x, color='k=.01')) + 
  geom_point(aes(y=y2, x=x, color='k=.02')) + 
  geom_point(aes(y=y3, x=x, color='k=.03')) + 
  scale_color_manual(values = c("black","blue","red","yellow"))

ggplot showing effect of different k values

# increase multiplication factor (b) -> set maximum y value
data.frame(x= seq(0,250,.5)) %>%
  mutate(y0 = 0 + 20*(1-exp(-x*.03)),
         y1 = 0 + 40*(1-exp(-x*.03)),
         y2 = 0 + 60*(1-exp(-x*.03)),
         y3 = 0 + 80*(1-exp(-x*.03))) %>%
  ggplot() + 
  geom_abline() + 
  geom_point(aes(y=y0, x=x, color='b=20')) + 
  geom_point(aes(y=y1, x=x, color='b=40')) + 
  geom_point(aes(y=y2, x=x, color='b=60')) + 
  geom_point(aes(y=y3, x=x, color='b=80')) +
  scale_color_manual(values = c("black","blue","red","yellow"))

ggplot showing effect of different b values

# increase additive (offset) factor (a) -> shift the curve up/down
data.frame(x= seq(0,250,.5)) %>%
  mutate(y0 = -10 + 60*(1-exp(-x*.03)),
         y1 = 0 + 60*(1-exp(-x*.03)),
         y2 = 10 + 60*(1-exp(-x*.03)),
         y3 = 20 + 60*(1-exp(-x*.03))) %>%
  ggplot() + 
  geom_abline() + 
  geom_point(aes(y=y0, x=x, color='a=-10')) + 
  geom_point(aes(y=y1, x=x, color='a=0')) + 
  geom_point(aes(y=y2, x=x, color='a=10')) + 
  geom_point(aes(y=y3, x=x, color='a=20')) +
  scale_color_manual(values = c("black","blue","red","yellow"))

ggplot showing effect of different a values

First, create a dataset that follows some arbitrary exponential distribution with some error in it:

set.seed(1); 
df = data.frame(x = seq(0,250,.5) * runif(n=501)) %>%
  mutate(y = (-2.5 + 75*(1-exp(-x*.05))) + rnorm(n=501,mean=0,sd=3))

I like to do some data visualization first to get a sense of what function(s) might work well. You can fit various linear and smoothed curves to the data using ggplot geom_smooth() as follows:

ggplot(df, aes(x=x,y=y)) + 
  geom_abline() + 
  geom_point() + 
  geom_smooth(se=F, aes(color="loess")) + 
  geom_smooth(se=F, method="lm", aes(color="linear")) +  
  geom_smooth(se=F, method="lm", formula = y ~ poly(x,2), aes(color="2nd order polynomial")) + 
  geom_smooth(se=F, method="lm", formula = y ~ poly(x,3), aes(color="3rd order polynomial")) + 
  geom_smooth(se=F, method = "nls", formula = y ~ 1+MaxVal*(1-exp(-x*k)), 
              method.args = list(start=c(MaxVal=80, k=.25)), 
              aes(color="exponential no additive")) +
  geom_smooth(se=F, method = "nls", formula = y ~ a+MaxVal*(1-exp(-x*k)), 
              method.args = list(start=c(a=1, MaxVal=80, k=.25)), 
              aes(color="exponential")) +
  geom_smooth(se=F, formula = y ~ log(x+1), aes(color="logarithmic")) +
  scale_color_manual(values = c("orange", "blue", "green", "red", "yellow", "pink", "purple"))

ggplot showing multiple different fitted curves

I've fit:

  1. the default (which uses method = 'loess' and formula = 'y ~ x'),
  2. a linear model (which uses formula = 'y ~ x'),
  3. 2nd order polynomial
  4. 3rd order polynomial
  5. increasing exponential decay with a fixed offset term (set to 1)
  6. increasing exponential decay with a variable offset term (a)
  7. a logarithmic model (with an x offset to avoid log(0) = -Inf errors.

As you can see, the linear and both polynomial curves fit quite poorly, whereas the loess, exponentials, and logarithmic seem pretty good at first glance. Potential drawbacks of the loess curve are: (1) it is non-parametric (no equation for it), (2) it may be overfitting (there is more 'wiggle' in it), (3) it is not strictly increasing.
Potential drawback of the logarithmic curve: (1) it is not strictly increasing - at higher values of x it begins to decline.

You can see this more clearly by adding another point at a high x value (without changing the source dataset):

df %>%
  bind_rows(data.frame(x=300, y=75)) %>%
  ggplot(., aes(x=x,y=y)) + 
  geom_abline() + 
  geom_point() + 
  geom_smooth(se=F, aes(color="loess")) + 
  geom_smooth(se=F, method = "nls", formula = y ~ 1+MaxVal*(1-exp(-x*k)), 
              method.args = list(start=c(MaxVal=80, k=.25)), 
              aes(color="exponential no additive")) +
  geom_smooth(se=F, method = "nls", formula = y ~ v+MaxVal*(1-exp(-x*k)), 
              method.args = list(start=c(MaxVal=80, k=.25, v=1)), 
              aes(color="exponential")) +
  geom_smooth(se=F, formula = y ~ log(x+1), aes(color="logarithmic")) +
  scale_color_manual(values = c("green", "red",  "pink", "purple")) 

ggplot showing just the loess, exponential, and logarithmic fitted curves with an additional point at (x=300,y=75)

Visualization is great and showed us that we probably want one of the two exponential functions, but now we need get those equations.

The usual ways are not going to work here:

# plugging the formula into lm() won't work because 
# you have unknown, undefined coefficients (b, k): 
lm(formula = as.formula("y ~ 1 + b*(1-exp(-(x*k)))"), method="lm", data = df)
# Error in eval(predvars, data, env) : object 'b' not found

# even if you arbitrarily plug in some values for those coefficients, it doesn't work: 
lm(formula = as.formula("y ~ 1 + 60*(1-exp(-(x*.05)))"), method="lm", data = df)
# Error in terms.formula(formula, data = data) : 
#  invalid model formula in ExtractVars

You want a method that solves for the optimal coefficients. You can use the nls() function, however, it may not work if you don't select some reasonable starting values for each of the parameters (it automatically initializes each of them to 1, which turns out to be pretty bad for this use case):

fit = nls(formula = as.formula("y ~ 1 + b*(1-exp(-(x*k)))"), data = df)
# Warning in nls(formula = as.formula("y ~ 1 + b*(1-exp(-(x*k)))"), data = df) :
#   No starting values specified for some parameters.
# Initializing ‘b’, ‘k’ to '1.'.
# Consider specifying 'start' or using a selfStart model
# Error in numericDeriv(form[[3L]], names(ind), env, central = nDcentral) : 
#   Missing value or an infinity produced when evaluating the model

Reasonably close starting values will do just fine:

# without additive term (offset): 
fit = nls(formula = as.formula("y ~ 1 + b*(1-exp(-(x*k)))"), 
          start = c(b=90, k=.05), data = df)
summary(fit)
Formula: y ~ 1 + b * (1 - exp(-(x * k)))

Parameters:
   Estimate Std. Error t value Pr(>|t|)    
b 71.881315   0.237978   302.1   <2e-16 ***
k  0.046566   0.000532    87.5   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.11 on 499 degrees of freedom

Number of iterations to convergence: 4 
Achieved convergence tolerance: 0.00000256

Because this model used a fixed offset (a = 1) the b and k coefficients weren't as close as they could be to our original values (b = 71.9 vs. 75, and k=.046 vs. 0.05).

Adding an additional parameter for the offset coefficient gets us closer:

# with offset term:
fit = nls(formula = as.formula("y ~ a + b*(1-exp(-(x*k)))"), 
          start = c(a=-15, b=60, k=.05), data = df)
summary(fit)
Formula: y ~ a + b * (1 - exp(-(x * k)))

Parameters:
   Estimate Std. Error t value    Pr(>|t|)    
a -2.830697   0.517438   -5.47 0.000000071 ***
b 75.317600   0.512742  146.89     < 2e-16 ***
k  0.050070   0.000709   70.61     < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.95 on 498 degrees of freedom

Number of iterations to convergence: 3 
Achieved convergence tolerance: 0.000000179

As you can see, the model with an offset term found coefficients quite close to our original values (a = -2.8 vs -2.5, b = 75.32 vs. 75, and k=.05 vs. 0.05), with residual standard error (RSE) basically equivalent to the sd=3 noise that I added to the arbitrary dataset.

Upvotes: 1

Ben Bolker
Ben Bolker

Reputation: 226182

R has self-starting non-linear fit functions (see apropos("^SS[a-z]") for a list, and look at the corresponding help pages for details). These use heuristic algorithms to estimate reasonable starting values for specific functional forms. SSasympOff (asymptotic + offset) is the appropriate one for this case.

set up data (from @jlhoward)

set.seed(1)     # so the example is reproduceable
df <- data.frame(Abundance=sort(sample(1:70,30)))
df$Richness <- with(df, 20*(1-exp(-0.03*Abundance))+rnorm(30))  

fit with SSasympOff

fit <- nls(Richness ~ SSasympOff(Abundance, Asym, lrc, c0), data = df)

Results:

Nonlinear regression model
  model: Richness ~ SSasympOff(Abundance, Asym, lrc, c0)
   data: df
   Asym     lrc      c0 
19.3613 -3.4294  0.2421 
 residual sum-of-squares: 22.04

Number of iterations to convergence: 0 
Achieved convergence tolerance: 5.97e-07

plot (also from @jlhoward)

df$pred <- predict(fit)
plot(df$Abundance,df$Richness)
lines(df$Abundance,df$pred, col="blue",lty=2)

a plot of points with the fitted line (saturating exponential)

Upvotes: 2

Rodrigo
Rodrigo

Reputation: 5129

Thanks, jlhoward. I've got to something similar after reading the link sent by shujaa.

R <- function(a, b, abT) a*(1 - exp(-b*abT))
form <- Richness ~ R(a,b,Abundance)
fit <- nls(form, data=d, start=list(a=20,b=0.01))
plot(d$Abundance,d$Richness, xlab="Abundance", ylab="Richness")
lines(d$Abundance, predict(fit,list(x=d$Abundance)))

I've found the initial values by trial and error, though. So your solution looks better :)

EDIT: The result:

enter image description here

Upvotes: 0

jlhoward
jlhoward

Reputation: 59355

This should get you started. Read the documentation on nls(...) (type ?nls at the command prompt). Also look up ?summary and ?predict.

set.seed(1)     # so the example is reproduceable
df <- data.frame(Abundance=sort(sample(1:70,30)))
df$Richness <- with(df, 20*(1-exp(-0.03*Abundance))+rnorm(30))  

fit <- nls(Richness ~ C*(1-exp(k*Abundance)),data=df, 
           algorithm="port",
           start=c(C=10,k=-1),lower=c(C=0,k=-Inf), upper=c(C=Inf,k=0))
summary(fit)
# Formula: Richness ~ C * (1 - exp(k * Abundance))
#
# Parameters:
#    Estimate Std. Error t value Pr(>|t|)    
# C 20.004173   0.726344   27.54  < 2e-16 ***
# k -0.030183   0.002334  -12.93  2.5e-13 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.7942 on 28 degrees of freedom
#
# Algorithm "port", convergence message: relative convergence (4)

df$pred <- predict(fit)
plot(df$Abundance,df$Richness)
lines(df$Abundance,df$pred, col="blue",lty=2)

Upvotes: 2

Related Questions