Reputation: 414
I have built a Bayesian Additive Regression Tree (BART), but I cannot seem to predict on the test data using the model. I have read the documentation, but it does not help much. See below for my example code:
x <- nsur.train[, 4:28]
y <- nsur.train$TARGET_AMOUNT
xtrain <- x[train, ]
ytrain <- y[train]
xtest <- x[-train, ]
ytest <- y[-train]
## fit bayesian additive reg tree
bartfit <- gbart(xtrain, ytrain, xtest)
## predict
predict(bartfit, xtest)
My error is that:
"Error in predict.wbart(bartfit, xtest) : The number of columns in newdata must be equal to 50"
Unsure how that is possible since my data clearly has 28 variables, of which I'm omitting 3.
Upvotes: 1
Views: 814
Reputation: 61
I had this issue when one of my response variables was zero across the board. It was removed within the BART function, but not for the predict.
Upvotes: 1
Reputation: 38
I ended up having the same issue. For me, the issue came from a binary variable, Z, being split into two binary variables, Z1 (1 when Z=1, else 0) and Z2 (1 when Z=0, else 0). Below Z is binary and X1 and X2 are continuous variables.
output <- wbart(x.train = Training[, c("Z", "X1", "X2")],
y.train = Training[, "Y_obs"]
)
predict(output, Testing[, c("Z", "X1", "X2")] )
The above predict
will give you an error because you are using the original binary variable Z, whereas the BART model is expecting to see Z1 and Z2. I tried manually creating Z1 and Z2 myself (for example: Z2 <- Training$Z
and Training$Z1 <- factor(1-as.numeric(Training$Z))
which did NOT work.
Essentially, the variable splitting of Z is coming from the class.ind()
command used in bartModelMatrix
(to view code, use getAnywhere("bartModelMatrix")
). I have extracted code from the bartModelMatrix
command to convert my Testing
dataset into the same format required by the BART object and it seems to have worked.
BART_predict_OOB <- function(X){
p = dim(X)[2]
xnm = names(X)
grp = NULL
i <- 1
for (i in 1:p) {
if (is.factor(X[[i]])) {
Xtemp = class.ind(X[[i]])
colnames(Xtemp) = paste(xnm[i], 1:ncol(Xtemp),
sep = "")
X[[i]] = Xtemp
grp = c(grp, rep(i, ncol(Xtemp)))
}
else {
X[[i]] = cbind(X[[i]])
colnames(X[[i]]) = xnm[i]
grp = c(grp, i)
}
}
Xtemp = cbind(X[[1]])
if (p > 1)
for (i in 2:p) Xtemp = cbind(Xtemp, X[[i]])
X = Xtemp
return(X)
}
test <- BART_predict_OOB(X=Testing[, c("Z", "X1", "X2")])
predict(output, test )
Upvotes: 2