Reputation: 1848
I have the below data:
> paste(data_s)
[1] "c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)"
[2] "c(34, 34, 35, 35, 35, 34, 6, 34, 34, 6, 34, 34, 34, 6, 6, 6, 34, 34, 35, 6, 34, 34, 34, 34, 34, 34, 34, 34, 6, 34, 35, 35, 34, 34, 6, 34, 34, 34, 34, 6, 6, 35, 34, 34, 34, 35, 6, 35, 34, 34, 34, 34, 34, 34, 6, 34, 34, 6, 34, 34, 34, 6, 34, 34, 34, 34, 6, 34, 34, 34, 35, 6, 35, 34, 34, 35, 34, 6, 6, 35, 34, 34, 6, 34, 6, 6, 34, 34, 6, 34, 6, 35, 34, 6, 34, 35, 34, 6, 34, 34)"
[3] "c(1, 1, 4, 0, 3, 4, 5, 2, 4, 1, 2, 1, 4, 9, 9, 1, 1, 5, 1, 4, 4, 2, 3, 2, 3, 2, 1, 2, 5, 6, 5, 5, 5, 1, 5, 5, 2, 1, 1, 3, 4, 2, 9, 1, 4, 3, 2, 5, 2, 2, 3, 4, 4, 5, 5, 4, 1, 2, 0, 3, 4, 2, 2, 5, 0, 2, 5, 3, 3, 1, 0, 1, 4, 2, 5, 1, 1, 4, 2, 3, 5, 1, 5, 0, 2, 4, 1, 5, 4, 2, 2, 4, 5, 1, 2, 2, 0, 3, 7, 3)"
> str(data_s)
tibble [100 × 3] (S3: tbl_df/tbl/data.frame)
$ y : num [1:100] 0 0 0 0 0 0 0 0 1 0 ...
$ x1: num [1:100] 34 34 35 35 35 34 6 34 34 6 ...
$ x2: num [1:100] 1 1 4 0 3 4 5 2 4 1 ...
- attr(*, "na.action")= 'omit' Named int [1:197659] 4 5 6 7 9 14 19 20 24 27 ...
..- attr(*, "names")= chr [1:197659] "4" "5" "6" "7" ...
I am using vivi function using vivid package to explore the feature importance of variables.
I write the below code:
library("vivid")
library("dplyr")
library("xgboost")
y=data_s["y"]
x=data_s[,c("x1","x2")]
gbst <- xgboost(data = as.matrix(x),
label = as.matrix(y),
nrounds = 600)
pFun <- function(fit, data, ...) predict(fit, as.matrix(x))
viviGBst <- vivi(fit = gbst,
data = data_s,
response = "y",
reorder = FALSE,
normalized = FALSE,
predictFun = pFun)
But I get the below error:
Error:
! Assigned data `predict(x, data = X[, cols, drop = FALSE])` must be compatible with existing data.
✖ Existing data has 5000 rows.
✖ Assigned data has 100 rows.
ℹ Only vectors of size 1 are recycled.
Run `rlang::last_error()` to see where the error occurred.
Why do I get this error and how can I fix it?
I will be very glad for any help.
Thanks.
Upvotes: 0
Views: 52
Reputation: 2900
A bit late but hopefully this can help other users.
To work with xgboost
in vivid
you need to use the term 'data' instead of the actual name of the data in the predict function. It also looks like you're not providing the full data set to the data
argument in xgboost
. You are only providing the explanatory variables and omitting the response.
Below is some code that should hopefully solve this issue:
library("vivid")
library("xgboost")
# create data:
data_s <- data.frame(
x1 = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
x2 = c(34, 34, 35, 35, 35, 34, 6, 34, 34, 6, 34, 34, 34, 6, 6, 6, 34, 34, 35, 6, 34, 34, 34, 34, 34, 34, 34, 34, 6, 34, 35, 35, 34, 34, 6, 34, 34, 34, 34, 6, 6, 35, 34, 34, 34, 35, 6, 35, 34, 34, 34, 34, 34, 34, 6, 34, 34, 6, 34, 34, 34, 6, 34, 34, 34, 34, 6, 34, 34, 34, 35, 6, 35, 34, 34, 35, 34, 6, 6, 35, 34, 34, 6, 34, 6, 6, 34, 34, 6, 34, 6, 35, 34, 6, 34, 35, 34, 6, 34, 34),
y = c(1, 1, 4, 0, 3, 4, 5, 2, 4, 1, 2, 1, 4, 9, 9, 1, 1, 5, 1, 4, 4, 2, 3, 2, 3, 2, 1, 2, 5, 6, 5, 5, 5, 1, 5, 5, 2, 1, 1, 3, 4, 2, 9, 1, 4, 3, 2, 5, 2, 2, 3, 4, 4, 5, 5, 4, 1, 2, 0, 3, 4, 2, 2, 5, 0, 2, 5, 3, 3, 1, 0, 1, 4, 2, 5, 1, 1, 4, 2, 3, 5, 1, 5, 0, 2, 4, 1, 5, 4, 2, 2, 4, 5, 1, 2, 2, 0, 3, 7, 3)
)
# must include the response in the data argument:
gbst <- xgboost(data = as.matrix(data_s[,1:3]),
label = as.matrix(data_s[,1]),
nrounds = 600)
# must use the term 'data' instead of actual name of data
# and include the response in the predict function:
pFun <- function(fit, data, ...) predict(fit, as.matrix(data[, 1:3]))
# run vivid
viviGBst <- vivi(fit = gbst,
data = data_s,
response = "y",
reorder = FALSE,
normalized = FALSE,
predictFun = pFun)
Upvotes: 0