oercim
oercim

Reputation: 1848

getting error while calculating feature importances - R

I have the below data:

    > paste(data_s)
[1] "c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)"                                                                           
[2] "c(34, 34, 35, 35, 35, 34, 6, 34, 34, 6, 34, 34, 34, 6, 6, 6, 34, 34, 35, 6, 34, 34, 34, 34, 34, 34, 34, 34, 6, 34, 35, 35, 34, 34, 6, 34, 34, 34, 34, 6, 6, 35, 34, 34, 34, 35, 6, 35, 34, 34, 34, 34, 34, 34, 6, 34, 34, 6, 34, 34, 34, 6, 34, 34, 34, 34, 6, 34, 34, 34, 35, 6, 35, 34, 34, 35, 34, 6, 6, 35, 34, 34, 6, 34, 6, 6, 34, 34, 6, 34, 6, 35, 34, 6, 34, 35, 34, 6, 34, 34)"
[3] "c(1, 1, 4, 0, 3, 4, 5, 2, 4, 1, 2, 1, 4, 9, 9, 1, 1, 5, 1, 4, 4, 2, 3, 2, 3, 2, 1, 2, 5, 6, 5, 5, 5, 1, 5, 5, 2, 1, 1, 3, 4, 2, 9, 1, 4, 3, 2, 5, 2, 2, 3, 4, 4, 5, 5, 4, 1, 2, 0, 3, 4, 2, 2, 5, 0, 2, 5, 3, 3, 1, 0, 1, 4, 2, 5, 1, 1, 4, 2, 3, 5, 1, 5, 0, 2, 4, 1, 5, 4, 2, 2, 4, 5, 1, 2, 2, 0, 3, 7, 3)"  

> str(data_s)
tibble [100 × 3] (S3: tbl_df/tbl/data.frame)
 $ y : num [1:100] 0 0 0 0 0 0 0 0 1 0 ...
 $ x1: num [1:100] 34 34 35 35 35 34 6 34 34 6 ...
 $ x2: num [1:100] 1 1 4 0 3 4 5 2 4 1 ...
 - attr(*, "na.action")= 'omit' Named int [1:197659] 4 5 6 7 9 14 19 20 24 27 ...
  ..- attr(*, "names")= chr [1:197659] "4" "5" "6" "7" ...

I am using vivi function using vivid package to explore the feature importance of variables.

I write the below code:

library("vivid")
library("dplyr")
library("xgboost")                   


y=data_s["y"]
x=data_s[,c("x1","x2")]

gbst <- xgboost(data = as.matrix(x),
                label = as.matrix(y),
                nrounds = 600)



pFun <- function(fit, data, ...) predict(fit, as.matrix(x))


viviGBst <- vivi(fit = gbst,
                 data = data_s,
                 response = "y",
                 reorder = FALSE,
                 normalized = FALSE,
                 predictFun = pFun)

But I get the below error:

Error:
! Assigned data `predict(x, data = X[, cols, drop = FALSE])` must be compatible with existing data.
✖ Existing data has 5000 rows.
✖ Assigned data has 100 rows.
ℹ Only vectors of size 1 are recycled.
Run `rlang::last_error()` to see where the error occurred.

Why do I get this error and how can I fix it?

I will be very glad for any help.

Thanks.

Upvotes: 0

Views: 52

Answers (1)

Electrino
Electrino

Reputation: 2900

A bit late but hopefully this can help other users.

To work with xgboost in vivid you need to use the term 'data' instead of the actual name of the data in the predict function. It also looks like you're not providing the full data set to the data argument in xgboost. You are only providing the explanatory variables and omitting the response.

Below is some code that should hopefully solve this issue:

library("vivid")
library("xgboost") 


# create data:
data_s <- data.frame( 
x1 = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),                                                                
x2 = c(34, 34, 35, 35, 35, 34, 6, 34, 34, 6, 34, 34, 34, 6, 6, 6, 34, 34, 35, 6, 34, 34, 34, 34, 34, 34, 34, 34, 6, 34, 35, 35, 34, 34, 6, 34, 34, 34, 34, 6, 6, 35, 34, 34, 34, 35, 6, 35, 34, 34, 34, 34, 34, 34, 6, 34, 34, 6, 34, 34, 34, 6, 34, 34, 34, 34, 6, 34, 34, 34, 35, 6, 35, 34, 34, 35, 34, 6, 6, 35, 34, 34, 6, 34, 6, 6, 34, 34, 6, 34, 6, 35, 34, 6, 34, 35, 34, 6, 34, 34),
y  = c(1, 1, 4, 0, 3, 4, 5, 2, 4, 1, 2, 1, 4, 9, 9, 1, 1, 5, 1, 4, 4, 2, 3, 2, 3, 2, 1, 2, 5, 6, 5, 5, 5, 1, 5, 5, 2, 1, 1, 3, 4, 2, 9, 1, 4, 3, 2, 5, 2, 2, 3, 4, 4, 5, 5, 4, 1, 2, 0, 3, 4, 2, 2, 5, 0, 2, 5, 3, 3, 1, 0, 1, 4, 2, 5, 1, 1, 4, 2, 3, 5, 1, 5, 0, 2, 4, 1, 5, 4, 2, 2, 4, 5, 1, 2, 2, 0, 3, 7, 3)
)
  
# must include the response in the data argument:
gbst <- xgboost(data = as.matrix(data_s[,1:3]),
                label = as.matrix(data_s[,1]),
                nrounds = 600)

# must use the term 'data' instead of actual name of data 
# and include the response in the predict function:
pFun <- function(fit, data, ...) predict(fit, as.matrix(data[, 1:3]))

# run vivid
viviGBst <- vivi(fit = gbst,
                 data = data_s,
                 response = "y",
                 reorder = FALSE,
                 normalized = FALSE,
                 predictFun = pFun)

Upvotes: 0

Related Questions