LucaS
LucaS

Reputation: 1303

Error in str2lang(x) trying to use nls() within a function

I am trying to fit an exponential decay curve to some biological data using nls() within a function (I want to do this multiple times).

I receive an error with the way I am specifying nls() which I can't work out:

Error in str2lang(x) : <text>:2:0: unexpected end of input
1: ~ 
   ^

A small dummy dataset and the code I have written is below - any help would be greatly appreciated.

dat_small <-  structure(list(actual_time = c(5, 9, 30, 59, 119, 171, 216), 
                             activity = c(7158, 7386, 5496, 3884, 1502, 819, 409)), row.names = c(NA, 
                                                                                                  -7L), class = c("tbl_df", "tbl", "data.frame"))
curve_fit <-  function(x, y, df) {
  #browser()
  # Mono-exponential model for activity using actual sampling time
  mod <-  lm(log(df[[y]]) ~ df[[x]], data = df) # get starting values from log-linear model
  C0 <-  as.numeric(exp(mod$coef[1])) # exponential of log-linear intercept
  lambda1 <-  as.numeric(abs(mod$coef[2])) # absolute value of slope of log-linear model
  nls_mod_mono <-  nls(df[[y]] ~ C0*exp(-lambda1*df[[x]]), start = c(C0 = C0, lambda1 = lambda1), data = df)
  summary(nls_mod_mono) # estimate model
  xNew <- seq(0, 240, length.out = 100) # new grid of times
  yNew <- predict(nls_mod_mono, list(x = xNew)) # predict activity
  dfNew <-  data.frame(x = xNew, y = yNew)
  p_mono <-  ggplot(dfNew, aes(x = x, y = y)) +
    geom_line() +
    geom_point(data = df, aes(x = x, y = y), size = 3) +
    xlab("Actual Sampling Time (mins)") + ylab("Activity (Bq)") +
    theme_bw(base_size = 20)
  mono_half_life <-  0.693/as.numeric(coef(nls_mod_mono)[2]) # half-life
  mono_auc <-  as.numeric(coef(nls_mod_mono)[1])/as.numeric(coef(nls_mod_mono)[2]) # AUC to infinity (can be estimated in mono case by simply dividing the intercept by lambda)
  results <-  c(p_mono, mono_half_life, mono_auc)
}
curve_fit("actual_time", "activity", dat_small)

Upvotes: 2

Views: 101

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 270045

Questions to SO should have minimal code focused on the problem at hand. Here the question is how to avoid the error in the nls call so everything after that is not relevant and we omit it. (We do point out, however, that aes in ggplot2 accepts unquoted variables and since x and y are character write it like this aes(.data[[x]], .data[[y]]) .)

1) An easy fix is probably just to avoid the problem by defining a data frame with fixed x and y column names and then use that.

curve_fit1 <- function(x, y, df) {
  data <- data.frame(x = df[[x]], y = df[[y]])
  co <- coef(lm(log(y) ~ x, data))
  st <- c(C0 = exp(co[[1]]), lambda1 = -co[[2]])
  nls(y ~ C0 * exp(-lambda1 * x), data = data, start = st) 
}

curve_fit1("actual_time", "activity", dat_small)

giving

Nonlinear regression model
  model: y ~ C0 * exp(-lambda1 * x)
   data: data
       C0   lambda1 
8.003e+03 1.301e-02 
 residual sum-of-squares: 270020

Number of iterations to convergence: 4 
Achieved convergence tolerance: 4.228e-06

2) Even easier is to use the NLS.expoDecay self starting function in the statforbiology package in which case no starting value is needed. The output is as in (1).

library(statforbiology)

curve_fit2 <- function(x, y, df) {
  data <- data.frame(x = df[[x]], y = df[[y]])
  nls(y ~ NLS.expoDecay(x, C0, lambda1), data)
}

curve_fit2("actual_time", "activity", dat_small)

3) We could alternately use substitute to substitute the variables into the formula. This can apply to either (1) or (2). We will show it as applied to (2).

library(statforbiology)

curve_fit3 <- function(x, y, df) {
  fo <- substitute(y ~ NLS.expoDecay(x, C0, lambda1), 
    list(x = as.name(x), y = as.name(y)))
  nls(fo, df)
}

curve_fit3("actual_time", "activity", dat_small)

giving

Nonlinear regression model
  model: activity ~ NLS.expoDecay(actual_time, C0, lambda1)
   data: df
       C0   lambda1 
8.003e+03 1.301e-02 
 residual sum-of-squares: 270020

Number of iterations to convergence: 4 
Achieved convergence tolerance: 4.228e-06

4) Yet another altertnative is to use the drc package together with DRC.exponDecay from statforbiology. In this case the simple form of formula accepted allows us to use reformulate to generate it and furthermore the drc class object produced from drm has coef, plot and other methods -- try methods(class = "drc") .

library(drc)
library(statforbiology)

curve_fit4 <- function(x, y, df) {
  drm(reformulate(x, y), data = df, fct = DRC.expoDecay())
}

res <- curve_fit4("actual_time", "activity", dat_small)
plot(res)

screenshot

Upvotes: 4

jay.sf
jay.sf

Reputation: 73562

Try a formula for the nls, in this case activity ~ C0 * exp(-lambda1 * actual_time).

You can use as.formula(paste(y, "~ C0*exp(-lambda1*", x, ")", sep = "")), or sprintf as shown below.

To name the variable in list in the newdata= argument of predict with the name you specified in x, use setNames.

Note, that using df[[x]] is possible, as ist's still used in lm. But then the data= argument is redundant.

The as.numeric can be confusing, as it is actually already "numeric". Use unname instead.

curve_fit <- function(x, y, df, plot=FALSE) {
 ## Mono-exponential model for activity using actual sampling time
 mod <- lm(log(df[[y]]) ~ df[[x]], data=df)  ## get starting values from log-linear model
 C0 <- unname(exp(mod$coef[1]))  ## exponential of log-linear intercept
 lambda1 <- unname(abs(mod$coef[2]))  ## absolute value of slope of log-linear model
 fo <- as.formula(sprintf('%s ~ C0*exp(-lambda1*%s)', y, x))
 nls_mod_mono <- nls(fo, start=c(C0=C0, lambda1=lambda1), data=df)
 xNew <- seq(0, 240, length.out=100)  ## new grid of times
 yNew <- predict(nls_mod_mono, newdata= list(xNew) |> setNames(x))  ## predict activity
 fit_df <- data.frame(x=xNew, y=yNew)
 mono_half_life <- 0.693/coef(nls_mod_mono)[2]  ## half-life
 mono_auc <- coef(nls_mod_mono)[1]/coef(nls_mod_mono)[2]  ## AUC to infinity (can be estimated in mono case by simply dividing the intercept by lambda)
 list(fit_df=fit_df, 
      mono_half_life=unname(mono_half_life), 
      mono_auc=unname(mono_auc))
}

> res <- curve_fit("actual_time", "activity", dat_small)
> res$mono_half_life
[1] 53.25588
> res$mono_auc
[1] 615006.6

I recommend plotting the curve outside the function to avoid feature-creeping.

par(mar=c(5, 4, 2, 2))
plot(y ~ x, res$fit_df, type='l', xlab='Actual Sampling Time (mins)',
     ylab="Activity (Bq)", las=1)

enter image description here

Upvotes: 4

Related Questions