Reputation: 339
I'm trying to use nls() to to curve-fit a dataset consisting of a mixture of normally and lognormally distributed values. However, the normally distributed subset contains negative values that the lognormal function cannot tolerate. Using nls(), is there a way to constrain the values which a PORTION of the fitted curve evaluate? (e.g. let the normal function evaluate across 0 and force the lognormal function to evaluate only x>0)
here's the test case I've been playing with:
test <- rnorm(5000, 2, 2)
test2 <- rlnorm(10000,2,2)
test3 <- append(test, test2)
bins <- seq(min(test3),100, .1)
tops <- data.frame(bin=bins, count=NA)
for (i in 1:nrow(tops)) { tops[i,2] <- length(test3[which(test3>=tops[i,1] &
test3<tops[i+1,1])]) }
fit <- nls(count ~ exp(-(bin-n.mu)^2/(2*n.sd^2))/(sqrt(2*pi)*n.sd)*C1 +
exp(-(log(bin)-l.mu)^2/(2*l.sd^2))/(sqrt(2*pi)*l.sd*bin)*C2,
data=tops, start=list(n.mu=2, n.sd=2, C1=500, l.mu=2, l.sd=2, C2=1000),
algorithm="port", trace=T)
coef(fit)
topsfit <- data.frame(bin=seq(-3, 100, 0.1))
topsfit$fit <- predict(fit, newdata=topsfit)
ggplot() + geom_point(data=tops, aes(x=(bins), y=count), shape=1, size=4) +
geom_path(data=topsfit, aes(x=(bin), y=fit), colour="red", size=1.5)
Very simply, I'm fitting a normal PDF + lognormal PDF. The problem is that log(bin) in the lognormal PDF does not play nice with negative numbers... but I don't want to crop negative values because that affects the calculations for the underlying, normally distributed values. I just want the lognormal half of my curve to ignore them.
alternatively, is there a different approach to accomplishing this task that doesn't rely on nls()?
Upvotes: 2
Views: 540
Reputation: 339
Seems like NO ONE wants to touch this topic, so I'll post a solution that I figured out with the help of a non-internet comrade-- the linchpin of my problem was in generating the functions that would comprise my curve. Writing the lognormal function separately allows conditional evaluation of x values, which is what I needed. Once I figured out that the nls() function operates on vectors and wrote my function to match, things shaped up quite nicely.
normal <- function(x, mu, sd, C) {
ans <- vector(length = length(x), mode = "numeric")
for (i in 1:length(x)) {
value <- exp(-(x[i]-mu)^2/(2*sd^2))/(sqrt(2*pi)*sd)*C
ans[i] <- value
}; return(ans) }
lognormal <- function(x, mu, sd, C) {
ans <- vector(length = length(x), mode = "numeric")
for (i in 1:length(x)) {
if (x[i]>0) {
value <- exp(-(log10(x[i])-mu)^2/(2*sd^2))/(sqrt(2*pi)*sd*x[i])*C
ans[i] <- value
} else { ans[i] <- 0 } }; return(ans) }
fit <- nls(count ~ normal(bin, n.mu, n.sd, C1) + lognormal(bin, l.mu, l.sd, C2),
data=tops, start=list(n.mu=30, n.sd=30, C1=5000,
l.mu=4, l.sd=2, C2=5000), algorithm="port", trace=T)
...and just like that, you can solve for mixed normal and lognormal distributions.
Upvotes: 1