Reputation: 5
I'm conducting a Monte Carlo study. I have a linear model with heteroskedasticity and left censoring of the dependent variable at 0. The mean of censoring rates is 25.9.
I get the error
Error in lm.fit(X.vlm, y = z.vlm, ...) : NA/NaN/Inf in 'x'
after trying to estimate a tobit model.
vglm(y[i,]~X[1,i,]+X[2,i,]+X[3,i,]+X[4,i,],family=tobit(Lower=0))
My data are simulated from standard distribution so the problem shoudn't come from odd variables.
I found two other questions that had the same problem with real data : lm() NA/NaN/Inf error , lm() NA/NaN/Inf error But there didn't seem to be any satisfying answers. Besides my data are easily reproducible so it should help identifying the problem
Here are the codes :
library(VGAM)
set.seed(12345)
nobs=100
nsim=100
b=c(2,-2,-3,3)
g=c(1,0.2)
y=matrix(rep(0,nobs*nsim),ncol=nobs,nrow=nsim)
X=array(0,dim=c(4,nsim,nobs))
res=matrix(rep(0,nobs*nsim),ncol=nobs,nrow=nsim)
tobit=vector(mode="list",length=nsim)
for(i in 1:nsim){
# generate covariates :
X[1,i,]=rlnorm(n=nobs)
X[2,i,]=runif(n=nobs)<=.75
X[3,i,]=rnorm(mean = 3,n=nobs)
X[4,i,]=runif(n=nobs,min=0,max=10)
res[i,]=(g[1]+g[2]*X[4,i,])*rnorm(n=nobs)
# generate censored dependent variable
y[i,]=b[1]*X[1,i,]+b[2]*X[2,i,]+b[3]*X[3,i,]+b[4]*X[4,i,]+res[i,]
y[i,]=sapply(y[i,],FUN=function(x){max(0,x)}) #apply censoring
tobit[[i]]<-vglm(y[i,]~X[1,i,]+X[2,i,]+X[3,i,]+X[4,i,],
family = tobit(Lower=0))
}
Here is the traceback
traceback()
5: lm.fit(X.vlm, y = z.vlm, ...)
4: vlm.wfit(xmat = X.vlm.save, z, Hlist = NULL, U = U, matrix.out =FALSE,
is.vlmX = TRUE, qr = qr.arg, xij = NULL)
3: vglm.fitter(x = x, y = y, w = w, offset = offset, Xm2 = Xm2,
Ym2 = Ym2, etastart = etastart, mustart = mustart, coefstart =coefstart,
family = family, control = control, constraints = constraints,
criterion = control$criterion, extra = extra, qr.arg = qr.arg,
Terms = mt, function.name = function.name, ...)
2: vglm(y[1, ] ~ X[1, 1, ] + X[2, i, ] + X[3, i, ] + X[4, i, ],
family = tobit(Lower = 0))
1: traceback(vglm(y[1, ] ~ X[1, 1, ] + X[2, i, ] + X[3, i, ] + X[4,
i, ], family = tobit(Lower = 0)))
*** Edit :
By removing one covariate (I tried with X[3,i,] and X[4,i,]) and setting the lower censoring at -0.001 as BondedDust suggest, It works fine and I even push the number of replications to 1000 without major problems.
By just setting the lower censoring at -0.001, and keeping all the covariates, I get two errors out of 100 iterations. It is worth noting that the error is now
Error in lm.fit(X.vlm, y = z.vlm, ...) : NA/NaN/Inf in 'y'
Besides I get these warnings
In vglm.fitter(x = x, y = y, w = w, offset = offset, Xm2 = Xm2, ... :
iterations terminated because half-step sizes are very small
Upvotes: 0
Views: 2341
Reputation: 263391
I noticed that this was reproducibly failing at i=1 so thought there might be a problem with the vglm
call itself. Looking at the examples in ?tobit
I added some parameters related to censored distributions, and started to get a few extra iterations. I then tried narrow the range of censoring and got more success with failure only 10% of the time. So I finally added a try() wrapper to let the loop iterate without stopping calculations and got a majority of successful runs with:
for(i in 1:nsim){
X[1,i,]=rlnorm(n=nobs)
X[2,i,]=runif(n=nobs)<=.75
X[3,i,]=rnorm(mean = 3,n=nobs)
X[4,i,]=runif(n=nobs,min=0,max=10)
res[i,]=(g[1]+g[2]*X[4,i,])*rnorm(n=nobs)
y[i,]=b[1]*X[1,i,]+b[2]*X[2,i,]+b[3]*X[3,i,]+b[4]*X[4,i,]+res[i,]
y[i,]=pmax(0,y[i,])
tobit[[i]]<-try( vglm(y[i,]~X[1,i,]+X[2,i,]+X[3,i,]+X[4,i,], crit = "coeff",
family = tobit(Lower=-.001, Upper=30, type.f = "cens")) )
}
Notice above that I replace your clunky and perhaps inefficient sapply( ... max)
with the equivalent pmax
.
> table( sapply(tobit, class))
try-error vglm
12 88
You can loop through the successful returns with:
sapply( tobit[ sapply(tobit, class) == "vglm"], coefficients)
Top of results:
[,1] [,2] [,3] [,4] [,5] [,6]
(Intercept):1 2.8460081 1.910137 1.672237 1.2888827 2.4970536 1.0006290
(Intercept):2 0.9183935 1.042424 1.094658 0.9767228 0.9263946 0.9250609
X[1, i, ] 1.7777788 1.880506 1.662835 1.6204394 1.4412304 1.6275208
X[2, i, ] -3.0847792 -0.453110 -1.152709 -0.9900163 -2.4705355 -0.9651577
X[3, i, ] -2.4272169 -2.094114 -2.314748 -2.4628501 -1.9001385 -2.1076416
X[4, i, ] 2.6225234 2.245107 2.460182 2.7027493 2.3653673 2.3841989
[,7] [,8] [,9] [,10] [,11] [,12]
(Intercept):1 0.9520376 1.6319010 1.572563 1.4709517 1.616158 2.4992492
(Intercept):2 0.8698777 0.9005506 1.147485 0.9285724 1.012186 0.9229233
X[1, i, ] 1.6483879 1.6789573 1.718641 1.6544123 1.599116 1.7204001
X[2, i, ] -0.3718720 -1.8690782 -2.408657 -1.7278915 -1.208939 -2.0037999
X[3, i, ] -2.2601637 -1.9118288 -2.359274 -1.7828438 -2.257556 -2.3778443
X[4, i, ] 2.5381367 2.3091630 2.583869 2.3582418 2.333988 2.4389336
After getting this modest degree of success I tried setting the Lower back to 0 and got all errors. Increasing the Upper value did not seem to affect success rates in limited testing. I'm unable to explain those findings, but perhaps the package author could be consulted.
Upvotes: 2