Martina Kavanová
Martina Kavanová

Reputation: 21

Fitting a sigmoid curve using a logistic function in R

I have data that follows a sigmoid curve and I would like fit a logistic function to extract the three (or two) parameters for each participant. I have found some methods online, but I'm not sure which is the correct option.

This tutorial explains that you should use the nls() function like this:

fitmodel <- nls(y~a/(1 + exp(-b * (x-c))), start=list(a=1,b=.5,c=25))
 ## get the coefficients using the coef function 
params=coef(fitmodel)

... where you clearly need the starting values to find the best-fitting values (?).

And then this post explains that to get the starting values, you can use a "selfstarting model can estimate good starting values for you, so you don't have to specify them":

fit <- nls(y ~ SSlogis(x, Asym, xmid, scal), data = data.frame(x, y))

However somewhere else I also read that you should use the SSlogis function for fitting a logistic function. Please could someone confirm whether these two steps are the best way to go about it? Or should I use values that I have extracted from previous similar data for the starting values?

Additionally, what should I do if I don't want the logistic function to be defined by the asymptote at all?

Thank you!

Upvotes: 1

Views: 3354

Answers (2)

tpetzoldt
tpetzoldt

Reputation: 5813

As @G. Grothendieck writes, there is no general "best way", it always depends on you particular aims. Use of SSLogis is a good idea, as you don't need to specify start values, but a definition of an own function is more flexible. See the following example, where we use heuristics to derive start values ourselves instead of specifying them manually. Then we fit a logistic model and as a small bonus, the Baranyi growth model with an explicit lag phase.

# time (t)
x <- c(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20)

# Algae cell counts (Mio cells per ml)
y <- c(0.88, 1.02, 1.43, 2.79, 4.61, 7.12,
       6.47, 8.16, 7.28, 5.67, 6.91)

## we now plot the data linearly and logarithmically
## the layout function is another way to subdivide the plotting area
nf <- layout(matrix(c(1,2,3,3), 2, 2, byrow = TRUE), respect = TRUE)
layout.show(nf) # this shows how the plotting area is subdivided

plot(x, y)
plot(x, log(y))

## we see that the first points show the steepest increase,
## so we can estimate a start value of the growth rate
r <- (log(y[5]) - log(y[1])) / (x[5] - x[1])
abline(a=log(y[1]), b=r)

## this way, we have a heuristics for all start parameters:
## r:  steepest increase of y in log scale
## K:  maximum value
## N0: first value

## we can check this by plotting the function with the start values
f <- function(x, r, K, N0) {K /(1 + (K/N0 - 1) * exp(-r *x))}
plot(x, y, pch=16, xlab="time (days)", ylab="algae (Mio cells)")
lines(x, f(x, r=r, K=max(y), N0=y[1]), col="blue")

pstart <- c(r=r, K=max(y), N0=y[1])
aFit   <- nls(y ~ f(x, r, K,N0), start = pstart, trace=TRUE)

x1 <- seq(0, 25, length = 100)
lines(x1, predict(aFit, data.frame(x = x1)), col = "red")
legend("topleft",
       legend = c("data", "start parameters", "fitted parameters"),
       col = c("black", "blue", "red"),
       lty = c(0, 1, 1),
       pch = c(16, NA, NA))

summary(aFit)
(Rsquared <- 1 - var(residuals(aFit))/var(y))

## =============================================================================
## Approach with Baranyi-Roberts model
## =============================================================================

## sometimes, a logistic is not good enough. In this case, use another growth
## model
baranyi <- function(x, r, K, N0, h0) {
  A <- x + 1/r * log(exp(-r * x) + exp(-h0) - exp(-r * x - h0))
  y <- exp(log(N0) + r * A - log(1 + (exp(r * A) - 1)/exp(log(K) - log(N0))))
  y
}

pstart <- c(r=0.5, K=7, N0=1, h0=2)
fit2   <- nls(y ~ baranyi(x, r, K, N0, h0), start = pstart, trace=TRUE)

lines(x1, predict(fit2, data.frame(x = x1)), col = "forestgreen", lwd=2)

legend("topleft",
       legend = c("data", "logistic model", "Baranyi-Roberts model"),
       col = c("black", "red", "forestgreen"),
       lty = c(0, 1, 1),
       pch = c(16, NA, NA))

Sigmoidal curve

Upvotes: 1

G. Grothendieck
G. Grothendieck

Reputation: 269644

There isn't a best way but SSlogis does eliminate having to set starting values whereas if you specify the formula you have more control over the parameterization.

If the question is really how to fix a at a predetermined level, here the value 1, without rewriting the formula then set a before running nls and omit it from the starting values.

a <- 1
fo <- y ~ a / (1 + exp(-b * (x-c)))
nls(fo, start = list(b = 0.5, c = 25))

Alternately this substitutes a=1 into formula fo giving fo2 without having to rewrite the formula yourself.

fo2 <- do.call("substitute", list(fo, list(a = 1)))
nls(fo2, start = list(b = 0.5, c = 25))

Upvotes: 1

Related Questions