RoiMinuit
RoiMinuit

Reputation: 414

Predicting with Bayesian Additive Regression Tree

I have built a Bayesian Additive Regression Tree (BART), but I cannot seem to predict on the test data using the model. I have read the documentation, but it does not help much. See below for my example code:

x <- nsur.train[, 4:28]
y <- nsur.train$TARGET_AMOUNT

xtrain <- x[train, ]
ytrain <- y[train]

xtest <- x[-train, ]
ytest <- y[-train]

## fit bayesian additive reg tree
bartfit <- gbart(xtrain, ytrain, xtest)

## predict
predict(bartfit, xtest)

My error is that:

"Error in predict.wbart(bartfit, xtest) : The number of columns in newdata must be equal to 50"

Unsure how that is possible since my data clearly has 28 variables, of which I'm omitting 3.

Upvotes: 1

Views: 814

Answers (2)

Jessica Ayers
Jessica Ayers

Reputation: 61

I had this issue when one of my response variables was zero across the board. It was removed within the BART function, but not for the predict.

Upvotes: 1

OCarroll
OCarroll

Reputation: 38

I ended up having the same issue. For me, the issue came from a binary variable, Z, being split into two binary variables, Z1 (1 when Z=1, else 0) and Z2 (1 when Z=0, else 0). Below Z is binary and X1 and X2 are continuous variables.

output <- wbart(x.train = Training[, c("Z", "X1", "X2")],
            y.train = Training[, "Y_obs"]
      )

predict(output, Testing[, c("Z", "X1", "X2")] )

The above predict will give you an error because you are using the original binary variable Z, whereas the BART model is expecting to see Z1 and Z2. I tried manually creating Z1 and Z2 myself (for example: Z2 <- Training$Z and Training$Z1 <- factor(1-as.numeric(Training$Z)) which did NOT work.

Essentially, the variable splitting of Z is coming from the class.ind() command used in bartModelMatrix (to view code, use getAnywhere("bartModelMatrix")). I have extracted code from the bartModelMatrix command to convert my Testing dataset into the same format required by the BART object and it seems to have worked.

BART_predict_OOB <- function(X){

p = dim(X)[2]
xnm = names(X)
grp = NULL
i <- 1
for (i in 1:p) {
  if (is.factor(X[[i]])) {
    Xtemp = class.ind(X[[i]])
    colnames(Xtemp) = paste(xnm[i], 1:ncol(Xtemp), 
                            sep = "")
    X[[i]] = Xtemp
    grp = c(grp, rep(i, ncol(Xtemp)))
  }
  else {
    X[[i]] = cbind(X[[i]])
    colnames(X[[i]]) = xnm[i]
    grp = c(grp, i)
  }
}

Xtemp = cbind(X[[1]])
if (p > 1) 
  for (i in 2:p) Xtemp = cbind(Xtemp, X[[i]])
X = Xtemp
return(X)
}

test <- BART_predict_OOB(X=Testing[, c("Z", "X1", "X2")])
predict(output, test )

Upvotes: 2

Related Questions