Reputation: 45
I have been trying to estimate a rather messy nonlinear regression model in R for quite some time now. After countless failed attempts using the nls
function, I am now trying my luck with optim
, which I have used many times in the past. For this example, I'll use the following data:
x1 <- runif(1000,0,7)
x2 <- runif(1000,0,7)
x3 <- runif(1000,0,7)
y <- log(.5 + .5*x1 + .7*x2 + .4*x3 + .05*x1^2 + .1*x2^2 + .15*x3^2 - .05*x1*x2 - .1*x1*x3 - .07*x2*x3 + .02*x1*x2*x2) + rnorm(1000)
I would like to estimate the parameters in the polynomial expression inside the log() function above, and so I have defined the following function to replicate a nonlinear least squares regression:
g <- function(coefs){
fitted <- coefs[1] + coefs[2]*x1 + coefs[3]*x2 + coefs[4]*x3 + coefs[5]*x1^2 + coefs[6]*x2^2 + coefs[7]*x3^2 + coefs[8]*x1*x2 + coefs[9]*x1*x3 + coefs[10]*x2*x3 + coefs[11]*x1*x2*x3
error <- y - log(fitted)
return(sum(error^2))
}
In order to avoid negative starting values inside the log() expression, I first estimate the linear model below:
lm.1 <- lm(I(exp(y)) ~ x1 + x2 + x3 + I(x1^2) + I(x2^2) + I(x3^2) + I(x1*x2) + I(x1*x3) + I(x2*x3) + I(x1*x2*x3))
intercept.start <- ifelse((min(fitted(lm.1)-lm.1$coefficients[1])) <= 0, -(min(fitted(lm.1)-lm.1$coefficients[1])) + .5, .5)
coefs.start <- c(intercept.start,lm.1$coefficients[-1])
Defining intercept.start
above guarantees that the expression inside of log() will be strictly positive at the outset. However, when I run the optim
command
nl.model <- optim(coefs.start, g, method="L-BFGS-B")
I get the following error message
Error in optim(coefs.start, g, method = "L-BFGS-B") :
L-BFGS-B needs finite values of 'fn'
In addition: Warning message:
In log(fitted) : NaNs produced
Does anyone know how I can force the optim
routine to simply disregard parameter estimates that would produce negative values inside of the log() expression? Thanks in advance.
Upvotes: 1
Views: 889
Reputation: 59395
Here's a slightly different approach.
Aside from the typo mentioned in the comment, if the issue is that the argument to the log(...)
is < 0 for certain parameter estimates, you can change the function definition to prevent that.
# just some setup - we'll need this later
set.seed(1)
err <- rnorm(1000, sd=0.1) # note smaller error sd
x1 <- runif(1000,0,7)
x2 <- runif(1000,0,7)
x3 <- runif(1000,0,7)
par <- c(0.5, 0.5, 0.7, 0.4, 0.05, 0.1, 0.15, -0.05, -0.1, -0.07, 0.02)
m <- cbind(1, x1, x2, x3, x1^2, x2^2, x3^2, x1*x2, x1*x3, x2*x3, x1*x2*x3)
y <- as.numeric(log(m %*% par)) + err
# note slight change in the model function definition
g <- function(coefs){
fitted <- coefs[1] + coefs[2]*x1 + coefs[3]*x2 + coefs[4]*x3 + coefs[5]*x1^2 + coefs[6]*x2^2 + coefs[7]*x3^2 + coefs[8]*x1*x2 + coefs[9]*x1*x3 + coefs[10]*x2*x3 + coefs[11]*x1*x2*x3
fitted <- ifelse(fitted<=0, 1, fitted) # ensures fitted > 0
error <- y - log(fitted)
return(sum(error^2))
}
lm.1 <- lm(I(exp(y)) ~ x1 + x2 + x3 + I(x1^2) + I(x2^2) + I(x3^2) + I(x1*x2) + I(x1*x3) + I(x2*x3) + I(x1*x2*x3))
nl.model <- optim(coef(lm.1), g, method="L-BFGS-B", control=list(maxit=1000))
nl.model$par
# (Intercept) x1 x2 x3 I(x1^2) I(x2^2) I(x3^2) I(x1 * x2) I(x1 * x3) I(x2 * x3) I(x1 * x2 * x3)
# 0.40453182 0.50136222 0.71696293 0.45335893 0.05461253 0.10210854 0.14913914 -0.06169715 -0.11195476 -0.08497180 0.02531717
with(nl.model, cat(convergence, message))
# 0 CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH
Note that these estimates are pretty close to the actual values. That's because in the setup I used a smaller error term (sd = 0.2 instead of 1). In your example, the error is large compared to the response (y
), so you're basically fitting random error.
If you fit the model using the actual parameter values as starting estimates, you get nearly identical results, no closer to the "true" values.
nl.model <- optim(par, g, method="L-BFGS-B", control=list(maxit=1000))
nl.model$par
# [1] 0.40222956 0.50159930 0.71734810 0.45459606 0.05465654 0.10206887 0.14899640 -0.06177640 -0.11209065 -0.08497423 0.02533085
with(nl.model, cat(convergence, message))
# 0 CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH
Try this with the original error (sd = 1) and see what happens.
Upvotes: 3
Reputation: 263441
Here's a log of my efforts to investigate. I put a maximum on the fitted values and got convergence. I then asked myself if increasing that max would do anything th the estimated parameters and found that there was no change... AND there was no difference from the starting values, so I think you messed up in building the function. Perhaps you can investigate further:
> gp <- function(coefs){
+
+ fitted <- coefs[1] + coefs[2]*x1 + coefs[3]*x2 + coefs[4]*x3 + coefs[5]*x1^2 + coefs[6]*x2^2 + coefs[7]*x3^2 + coefs[8]*x1*x2 + coefs[9]*x1*x3 + coefs[10]*x2*x3 + coefs[11]*x1*x2*x3 }
> describe( gp( coefs.start) ) #describe is from pkg:Hmisc
gp(coefs.start)
n missing unique Info Mean .05 .10 .25 .50 .75
1000 0 1000 1 13.99 2.953 4.692 8.417 12.475 18.478
.90 .95
25.476 28.183
lowest : 0.5000 0.5228 0.5684 0.9235 1.1487
highest: 41.0125 42.6003 43.1457 43.5950 47.2234
> g <- function(coefs){
+
+ fitted <- max( coefs[1] + coefs[2]*x1 + coefs[3]*x2 + coefs[4]*x3 + coefs[5]*x1^2 + coefs[6]*x2^2 + coefs[7]*x3^2 + coefs[8]*x1*x2 + coefs[9]*x1*x3 + coefs[10]*x2*x3 + coefs[11]*x1*x2*x3 , 1000)
+ error <- y - log(fitted)
+ return(sum(error^2))
+ }
> nl.model <- optim(coefs.start, g, method="L-BFGS-B")
> nl.model
$par
x1 x2 x3 I(x1^2)
0.77811231 -0.94586233 -1.33540959 1.65454871 0.31537594
I(x2^2) I(x3^2) I(x1 * x2) I(x1 * x3) I(x2 * x3)
0.45717138 0.11051418 0.59197115 -0.25800792 0.04931727
I(x1 * x2 * x3)
-0.08124126
$value
[1] 24178.62
$counts
function gradient
1 1
$convergence
[1] 0
$message
[1] "CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL"
> g <- function(coefs){
+
+ fitted <- max( coefs[1] + coefs[2]*x1 + coefs[3]*x2 + coefs[4]*x3 + coefs[5]*x1^2 + coefs[6]*x2^2 + coefs[7]*x3^2 + coefs[8]*x1*x2 + coefs[9]*x1*x3 + coefs[10]*x2*x3 + coefs[11]*x1*x2*x3 , 100000)
+ error <- y - log(fitted)
+ return(sum(error^2))
+ }
> nl.model <- optim(coefs.start, g, method="L-BFGS-B")
> nl.model
$par
x1 x2 x3 I(x1^2)
0.77811231 -0.94586233 -1.33540959 1.65454871 0.31537594
I(x2^2) I(x3^2) I(x1 * x2) I(x1 * x3) I(x2 * x3)
0.45717138 0.11051418 0.59197115 -0.25800792 0.04931727
I(x1 * x2 * x3)
-0.08124126
$value
[1] 89493.99
$counts
function gradient
1 1
$convergence
[1] 0
$message
[1] "CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL"
.
Upvotes: 1