FriendlyFred
FriendlyFred

Reputation: 43

Fitting a sigmoidal curve to points with ggplot

I have a simple dataframe for the response measurements from a drug treatment at various doses:

drug <- c("drug_1", "drug_1", "drug_1", "drug_1", "drug_1", 
  "drug_1", "drug_1", "drug_1", "drug_2", "drug_2", "drug_2", 
        "drug_2", "drug_2", "drug_2", "drug_2", "drug_2")

conc <- c(100.00, 33.33, 11.11, 3.70, 1.23, 0.41, 0.14, 
        0.05, 100.00, 33.33, 11.11, 3.70, 1.23, 0.41, 0.14, 0.05)

mean_response <- c(1156, 1833, 1744, 1256, 1244, 1088, 678, 489, 
        2322, 1867, 1333, 944, 567, 356, 200, 177)

std_dev <- c(117, 317, 440, 200, 134, 38, 183, 153, 719,
      218, 185, 117, 166, 167, 88, 50)

df <- data.frame(drug, conc, mean_response, std_dev)

I can plot these point using the following code and get the basic foundation of the visualization that I would like:

p <- ggplot(data=df, aes(y=mean_response, x= conc, color = drug)) +
  geom_pointrange(aes(ymax = (mean_response + std_dev), ymin = (mean_response - std_dev))) +
  scale_x_log10()

p

plot

The next thing I would like to do with these data is add a sigmoidal curve to the plot, that fits the plotted points for each drug. Following that, I would like to calculate the EC50 for this curve. I realize I may not have the entire range of the sigmoidal curve in my data, but I am hoping to get the best estimate I can with what I have. Also, the final point for drug_1 does not follow the expected trend of a sigmoidal curve, but this is actually not unexpected as the solutions that the drug is in can inhibit responses at high concentrations (each drug is in a different solution). I would like to exclude this point from the data.

I am getting stuck at the step of fitting a sigmoidal curve to my data. I have looked over some other solutions to fitting sigmoidal curves to data but none seem to work.

One post that is very close to my problem is this: (sigmoid) curve fitting glm in r

Based on it, I tried:

p + geom_smooth(method = "glm", family = binomial, se = FALSE)

This gives the following error, and seems to default to plotting straight lines:

`geom_smooth()` using formula 'y ~ x'
Warning message:
Ignoring unknown parameters: family 

I have also tried the solution from this link: Fitting a sigmoidal curve to this oxy-Hb data

In this case, I get the following error:

Computation failed in `stat_smooth()`:
Convergence failure: singular convergence (7) 

and no lines are added to the plot.

I have tried looking up both of these errors but cannot seem to find a reason that makes sense with my data.

Any help would be much appreciated!

Upvotes: 3

Views: 6414

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226182

As I said in a comment, I would only use geom_smooth() for a very easy problem; as soon as I run into trouble I use nls instead.

My answer is very similar to @Duck's, with the following differences:

  • I show both unweighted and (inverse-variance) weighted fits.
  • In order to get the weighted fits to work, I had to use the nls2 package, which provides a slightly more robust algorithm
  • I use SSlogis() to get automatic (self-starting) initial parameter selection
  • I do all of the prediction outside of ggplot2, then feed it into geom_line()
p1 <- nls(mean_response~SSlogis(conc,Asym,xmid,scal),data=df,
          subset=(drug=="drug_1" & conc<100)
        ## , weights=1/std_dev^2  ## error in qr.default: NA/NaN/Inf ...
          )

library(nls2)
p1B <- nls2(mean_response~SSlogis(conc,Asym,xmid,scal),data=df,
            subset=(drug=="drug_1" & conc<100),
            weights=1/std_dev^2)

p2 <- update(p1,subset=(drug=="drug_2"))
p2B <- update(p1B,subset=(drug=="drug_2"))

pframe0 <- data.frame(conc=10^seq(log10(min(df$conc)),log10(max(df$conc)), length.out=100))
pp <- rbind(
    data.frame(pframe0,mean_response=predict(p1,pframe0),
               drug="drug_1",wts=FALSE),
    data.frame(pframe0,mean_response=predict(p2,pframe0),
               drug="drug_2",wts=FALSE),
    data.frame(pframe0,mean_response=predict(p1B,pframe0),
               drug="drug_1",wts=TRUE),
    data.frame(pframe0,mean_response=predict(p2B,pframe0),
               drug="drug_2",wts=TRUE)
)

library(ggplot2); theme_set(theme_bw())
(ggplot(df,aes(conc,mean_response,colour=drug)) +
 geom_pointrange(aes(ymin=mean_response-std_dev,
                     ymax=mean_response+std_dev)) +
 scale_x_log10() +
 geom_line(data=pp,aes(linetype=wts),size=2)
)

enter image description here

I believe the EC50 is equivalent to the xmid parameter ... note the large differences between weighted and unweighted estimates ...

Upvotes: 4

Duck
Duck

Reputation: 39595

I would suggest next approach which is close to what you want. I also tried with a setting for your data using binomial family but there are some issues about values between 0 and 1. In that case you would need an additional variable in order to determine the respective proportions. The code in the following lines use a non linear approximation in order to sketch your output.

Initially, the data:

library(ggplot2)
#Data
df <- structure(list(drug = c("drug_1", "drug_1", "drug_1", "drug_1", 
"drug_1", "drug_1", "drug_1", "drug_1", "drug_2", "drug_2", "drug_2", 
"drug_2", "drug_2", "drug_2", "drug_2", "drug_2"), conc = c(100, 
33.33, 11.11, 3.7, 1.23, 0.41, 0.14, 0.05, 100, 33.33, 11.11, 
3.7, 1.23, 0.41, 0.14, 0.05), mean_response = c(1156, 1833, 1744, 
1256, 1244, 1088, 678, 489, 2322, 1867, 1333, 944, 567, 356, 
200, 177), std_dev = c(117, 317, 440, 200, 134, 38, 183, 153, 
719, 218, 185, 117, 166, 167, 88, 50)), class = "data.frame", row.names = c(NA, 
-16L))

In a non linear least squares, you need to define initial values for the search of ideal parameters. We use next code with base function nls() to obtain those initial values:

#Drug 1
fm1 <- nls(log(mean_response) ~ log(a/(1+exp(-b*(conc-c)))), df[df$drug=='drug_1',], start = c(a = 1, b = 1, c = 1))
#Drug 2
fm2 <- nls(log(mean_response) ~ log(a/(1+exp(-b*(conc-c)))), df[df$drug=='drug_2',], start = c(a = 1, b = 1, c = 1))

With this initial approach of parameters, we sketch the plot using geom_smooth(). We again use nls() to find the right parameters:

#Plot
ggplot(data=df, aes(y=mean_response, x= conc, color = drug)) +
  geom_pointrange(aes(ymax = (mean_response + std_dev), ymin = (mean_response - std_dev))) +
  geom_smooth(data = df[df$drug=='drug_1',],method = "nls", se = FALSE,
              formula = y ~ a/(1+exp(-b*(x-c))),
              method.args = list(start = coef(fm1),
                                 algorithm='port'),
              color = "tomato")+
  geom_smooth(data = df[df$drug=='drug_2',],method = "nls", se = FALSE,
              formula = y ~ a/(1+exp(-b*(x-c))),
              method.args = list(start = coef(fm0),
                                 algorithm='port'),
              color = "cyan3")

The output:

enter image description here

Upvotes: 1

Related Questions