PeterFoster
PeterFoster

Reputation: 339

nls peak fitting with a mixed normal and lognormal dataset (R)

I'm trying to use nls() to to curve-fit a dataset consisting of a mixture of normally and lognormally distributed values. However, the normally distributed subset contains negative values that the lognormal function cannot tolerate. Using nls(), is there a way to constrain the values which a PORTION of the fitted curve evaluate? (e.g. let the normal function evaluate across 0 and force the lognormal function to evaluate only x>0)

here's the test case I've been playing with:

test <- rnorm(5000, 2, 2)
test2 <- rlnorm(10000,2,2)
test3 <- append(test, test2)

bins <- seq(min(test3),100, .1)
tops <- data.frame(bin=bins, count=NA)
for (i in 1:nrow(tops)) {  tops[i,2] <- length(test3[which(test3>=tops[i,1] & 
     test3<tops[i+1,1])])  }

fit <- nls(count ~ exp(-(bin-n.mu)^2/(2*n.sd^2))/(sqrt(2*pi)*n.sd)*C1 + 
         exp(-(log(bin)-l.mu)^2/(2*l.sd^2))/(sqrt(2*pi)*l.sd*bin)*C2,
         data=tops, start=list(n.mu=2, n.sd=2, C1=500, l.mu=2, l.sd=2, C2=1000), 
         algorithm="port", trace=T)

coef(fit)
topsfit <- data.frame(bin=seq(-3, 100, 0.1))
topsfit$fit <- predict(fit, newdata=topsfit)

ggplot() + geom_point(data=tops, aes(x=(bins), y=count), shape=1, size=4) + 
  geom_path(data=topsfit, aes(x=(bin), y=fit), colour="red", size=1.5) 

Very simply, I'm fitting a normal PDF + lognormal PDF. The problem is that log(bin) in the lognormal PDF does not play nice with negative numbers... but I don't want to crop negative values because that affects the calculations for the underlying, normally distributed values. I just want the lognormal half of my curve to ignore them.

alternatively, is there a different approach to accomplishing this task that doesn't rely on nls()?

Upvotes: 2

Views: 540

Answers (1)

PeterFoster
PeterFoster

Reputation: 339

Seems like NO ONE wants to touch this topic, so I'll post a solution that I figured out with the help of a non-internet comrade-- the linchpin of my problem was in generating the functions that would comprise my curve. Writing the lognormal function separately allows conditional evaluation of x values, which is what I needed. Once I figured out that the nls() function operates on vectors and wrote my function to match, things shaped up quite nicely.

normal <- function(x, mu, sd, C) {  
  ans <- vector(length = length(x), mode = "numeric")
  for (i in 1:length(x)) {  
    value <- exp(-(x[i]-mu)^2/(2*sd^2))/(sqrt(2*pi)*sd)*C
    ans[i] <- value
  };  return(ans)  }

lognormal <- function(x, mu, sd, C) {
  ans <- vector(length = length(x), mode = "numeric")
  for (i in 1:length(x)) {
    if (x[i]>0) {
      value <- exp(-(log10(x[i])-mu)^2/(2*sd^2))/(sqrt(2*pi)*sd*x[i])*C
      ans[i] <- value
    }  else {  ans[i] <- 0  }  };  return(ans)  }


fit <- nls(count ~ normal(bin, n.mu, n.sd, C1) + lognormal(bin, l.mu, l.sd, C2),
             data=tops, start=list(n.mu=30, n.sd=30, C1=5000, 
             l.mu=4, l.sd=2, C2=5000), algorithm="port", trace=T)

...and just like that, you can solve for mixed normal and lognormal distributions.

Upvotes: 1

Related Questions