Reputation: 7755
I am trying to fit exponential decay functions on data which has only few time points. I would like to use the exponential decay equation y = y0*e^(-r*time)
in order to compare r
(or eventually half-life) between datasets and factors. I have understood that using a linear fit instead of nls is a better alternative for this particular function [1,2], if I want to estimate the confidence intervals (which I do).
Copy this to get some example data:
x <- structure(list(Factor = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L,
1L, 3L, 3L, 3L, 2L, 2L, 4L, 4L, 4L, 3L, 3L, 3L, 1L, 1L, 1L, 1L,
3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 3L, 3L,
3L, 3L, 3L, 1L, 1L, 1L, 1L), .Label = c("A", "B", "C", "D"), class = "factor"),
time = c(0.25, 0.26, 0.26, 0.26, 0.27, 0.29, 0.29, 0.33,
0.38, 0.38, 0.38, 0.39, 0.4, 0.4, 0.41, 0.45, 0.45, 0.45,
0.45, 0.47, 0.51, 0.51, 0.52, 0.57, 0.57, 0.57, 0.57, 0.58,
0.58, 0.58, 0.6, 0.6, 0.6, 0.61, 0.61, 0.61, 0.62, 0.62,
0.64, 0.64, 0.67, 0.67, 0.67, 0.67, 0.69, 0.7, 0.7, 0.71,
0.76, 0.76, 0.77, 0.77, 0.79, 0.79, 0.8, 0.8, 0.83, 0.83,
0.84, 0.84, 0.86, 0.86, 0.87, 0.87, 18.57, 18.57, 18.57,
18.58, 18.69, 18.69, 18.7, 18.7, 18.7, 18.71, 18.71, 18.71,
18.74, 18.74, 18.74, 18.79, 18.85, 18.85, 18.86, 18.88, 18.89,
18.89, 18.89, 18.93, 18.93, 18.95, 18.95, 18.95, 18.96, 18.96,
18.96, 20.57, 20.57, 20.61, 20.62, 20.66, 20.67, 20.67, 20.67,
20.72, 20.72, 20.72, 21.18, 21.19, 21.19, 21.19, 21.22, 21.22,
21.22, 21.23, 21.25, 21.25, 21.25, 21.25, 87.58, 87.58, 87.64,
87.64, 87.65, 87.84, 87.85, 87.91, 87.91, 87.91, 89.27, 89.28,
89.28, 89.36, 89.36, 89.4, 89.4, 110.91, 112.19, 112.19,
112.2, 112.2, 112.24, 112.25, 112.25, 112.26, 185.6, 185.6,
185.63, 185.63, 185.64, 213, 234.96, 234.97, 234.97, 234.98,
235.01, 235.01, 235.02, 235.02), y = c(58.1, 42.9, 54.2,
45.3, 51.2, 44.4, 56.9, 53.4, 61.3, 49.3, 54.4, 55.6, 25.6,
48.1, 50.8, 54.7, 41.8, 46.2, 39.5, 51.7, 37.7, 43.1, 44.6,
48.4, 50.9, 62.5, 58.6, 47.8, 44.3, 55.6, 44.9, 49.1, 49.1,
60.3, 40.8, 57.6, 42.9, 60, 49.4, 54.1, 37.8, 46.5, 59, 64.3,
48, 54.3, 51.7, 59, 57.1, 29.4, 49.2, 50, 41.3, 40.5, 43.4,
48.6, 38.5, 35.7, 43.6, 60, 32, 27.3, 34.3, 44.4, 36.5, 25.4,
22.6, 25.5, 24.1, 18.9, 25, 5.9, 19.6, 15.7, 32.3, 14.3,
23.4, 29.4, 17, 18.3, 34.4, 26.4, 35.7, 22.6, 23.5, 19.3,
25.5, 34.7, 45.5, 38.1, 33.8, 47.9, 32.3, 32.1, 43, 27.8,
33.3, 25.5, 22.2, 29.2, 24.2, 22.8, 19.2, 31.6, 20.8, 26.4,
35.8, 50, 10.7, 24, 54.3, 67, 77.7, 51.7, 64.8, 49.3, 57.8,
43.2, 17, 17.4, 36.4, 60.2, 36, 4, 0, 0, 9.1, 2.9, 24.3,
18.8, 36, 16.3, 18.4, 17.1, 26.5, 29.3, 17.4, 23.1, 25.7,
32.7, 16.3, 14.6, 13.7, 16.2, 16.7, 21.9, 0, 0, 11.6, 8.6,
0, 3.7, 3.6, 5, 3.2, 0, 2.5, 5.7)), .Names = c("Factor",
"time", "y"), row.names = c(NA, -158L), class = "data.frame")
I manage to do this using the standard logarithmic function log(y) = x
(thanks to this example), but fail when trying to fit several parameters in linear space.
summary(lm(log(y) ~ time, data = x, subset = Factor)) # I need the summary statistics to compare models
ggplot(x, aes(x = time, y = y, color = Factor)) + geom_point() + geom_smooth(method = "glm", family = gaussian(lin="log"), start=c(5,0))
Here is what I have tried:
## Summary
log.dec.fun <- function(N, r, time) -r*time + log(N) # The function in linear format
summary(glm(y ~ log.dec.fun(N, r, time), data = x, subset = Factor, start = c(5,0)))
# Error in log.dec.fun(N, r, time) : object 'r' not found
predict(glm(y ~ log.dec.fun(N, r, time), data = x, start = c(5,0)))
# Error in log.dec.fun(N, r, time) : object 'r' not found
## Plot
ggplot(x, aes(x = time, y = y, color = Factor)) + geom_point() + geom_smooth(method = "glm", formula = y ~ log.dec.fun(N, r, time), start = c(5,0))
#Error in log.dec.fun(N, r, time) : object 'r' not found
#Error in if (nrow(layer_data) == 0) return() : argument is of length zero
I can manage to get quite satisfactory models using nls
, but I have learned that calculating confidence intervals for nls
functions verges upon magic and beginners should not even try doing that.
dec.fun <- function(N, r, time) N*exp(-r*time) ## The function in non-linear form
g <- c()
for(i in 1:nlevels(x$Factor)){
z <- subset(x, Factor == levels(x$Factor)[i])
g <- append(g, predict(nls(y ~ dec.fun(N, r, time), data = z, start = list(N = 5, r = 0))))}
x <- x[with(x, order(Factor, time)),]
x$modelled <- g
ggplot(x, aes(x = time, color = Factor)) + geom_point(aes(y = y)) + geom_line(aes(y = modelled))
So my question is how to fit exponential decay functions using R, ggplot2 and linear approximation? There is an answer in SO, where @Joe Kington indicates that this is possible and provides the Python code. Unfortunately I do not understand Python.
Upvotes: 4
Views: 22046
Reputation: 36076
I believe you simply need to allow for separate slopes and intercepts to be fit by your grouping variable Factor
when you fit the model with the natural logarithm transformation for the response. I call this a separate lines model. Then you can predict and get confidence (or prediction) intervals on the log scale for each Factor
, and back-transform to see the lines (much like the graphs in your original post from ggplot2
.
Example of a separate lines model in R:
fit1 = lm(y ~ time*Factor, data = x)
summary(fit1)
The output of this model will show the estimated intercept for the reference level of Factor
, the estimated slope for the reference level, and the difference in intercepts and slopes between the reference level and all other levels.
Alternatively, you could code the separate lines model:
fit2 = lm(y ~ time + time:Factor - 1, data = x)
summary(fit2)
This will show you the estimated intercept and slope separately for each level of Factor
in your output.
To make lines based on the model, you can use predict
and then back-transform to the original scale. Assuming a natural log transformation (and adding the values to your original dataset):
(x$pred = exp(predict(fit1)) )
You can also calculate and exponentiate your confidence intervals to the original scale if that's what you need.
exp(predict(fit1, interval = "confidence"))
Organizationally, you may want to put these as columns in your original dataset, as well, which you could do a variety of ways. The simplest may be to simply cbind
them to the dataset x
.
Upvotes: 8