Onur Alkan
Onur Alkan

Reputation: 85

How can I find non-linear regression model starting values?

I'm trying to fit a non-linear tree diameter height model (Max & Burkhart, 1976) to my data set (consists of D, breast height diameter (cm); H, total tree height (m); hi section height from ground level, di diameter at hi level etc.) in R.

I'm having trouble on fitting the model. I think it's because of the starting parameter values of the equation. I get "NaNs produced" errors. I tried to tweak the starting parameters. The number of errors decreased to 1 but not zero. So I need to find a way to estimate starting parameters for a non-linear regression model. I searched for Self starting models but could not apply to my equation because of complexity of the equation and my lack of knowledge. I will add all my data set here so you guys maybe show me a way.

By the way, I'm not sure if I can attach files to my question, so I will give a link to my dataset for anyone who wants to view or download. I uploaded my data to google drive and the link is https://drive.google.com/file/d/1q7W1bUcx4sK2G2QPte7ZtCudSLfBxpet/view?usp=sharing

# Function to compute Max & Burkhart (1976) equation
ComputeDi.MaxBurkhart <- function(hi, d, h, b1, b2, b3, b4, a1, a2){
    x <- hi / h
    x1 <- x - 1 
    x2 <- x ^ 2 - 1
    di <- d * sqrt(b1 * x1 + b2 * x2 + b3 * (a1 - x) ^ 2 * ((a1 - x) >= 0.0) + b4 * (a2 - x) ^ 2 * ((a2 - x) >= 0.0))
    return(di)
}

# Set the working directory
setwd("../Data")

# Load data and rename some variables
sylvestris <- read.csv("mydata.csv")

# Global fitting
nlmod.fp.di <- nls(di ~ ComputeDi.MaxBurkhart(hi, d, h, b1, b2, b3, b4, a1, a2), data = sylvestris, start = c(b1 = -2.53, b2 = 1.2, b3 = -1.5, b4 = 22, a1 = 0.72, a2 = 0.15

), control = nls.control(tol = 1e-07))
summary(nlmod.fp.di, correlation = T)

It's all OK until here. I'm getting Nan Errors after here!

# Set seed and select names of trees
trees <- unique(sylvestris$tree) 
set.seed(15)
result.list <- list()
i <- 1
while(length(trees) > 0){
    tree.smp <- sample(trees, 10, replace = F)
    sylvestris.smp <- sylvestris[sylvestris$tree %in% tree.smp, ]
    fitting.ols <- try(nls(di ~ ComputeDi.MaxBurkhart(hi, d, h, b1, b2, b3, b4, a1, a2), data = sylvestris.smp, start = c(b1 = -2.53, b2 = 1.2, b3 = -1.5, b4 = 22, a1 = 0.72, a2 = 0.15

), control = nls.control(tol = 1e-07)), silent = T)
    if(class(fitting.ols)[1] == "try-error"){
            fit.smp <- data.frame(trees = paste(tree.smp, collapse = "_"), 
t(rep(NA, 8)))
            names(fit.smp) <- c("trees", "b1", "b2", "b3", "b4", "a1", 
"a2", "NS", "RSE")
    } else {
            nlmod.ols <- fitting.ols
            fit.smp <- data.frame(trees = paste(tree.smp, collapse = "_"), t(coef(fitting.ols)), NS = sum(summary(fitting.ols)$parameters[, 4] > 0.05), RSE = summary(fitting.ols)$sigma)
    }
    result.list[[i]] <- fit.smp
    i <- i + 1
    trees <- trees[!trees %in% tree.smp]        
}     

I expect significant parameter estimations without any NaN errors. I'm sure the problem is about the starting values because this code block works perfect with another data set. When I changed the data, I get this errors. Thank you in advance.

Upvotes: 0

Views: 548

Answers (1)

Jet
Jet

Reputation: 690

You can try to use the package nls.multstart, which is made to simplify the estimation of starting values.

You can basically specify ranges of starting parameters, and the regression will be made using the best parameters, based on AIC score.

Upvotes: 0

Related Questions