Reputation: 5945
I am using the R programming language. I am trying to follow this tutorial over here: https://cran.r-project.org/web/packages/lime/vignettes/Understanding_lime.html
I tried to create my own data to replicate this tutorial with:
#load libraries
library(MASS)
library(lime)
library(randomForest)
#create data
var_1<- rnorm(100,1,4)
var_2 <-rnorm(10,10,5)
var_3<- c("0","2", "4")
var_3 <- sample(var_3, 100, replace=TRUE, prob=c(0.3, 0.6, 0.1))
response<- c("1","0")
response <- sample(response, 100, replace=TRUE, prob=c(0.3, 0.7))
#put them into a data frame called "f"
f <- data.frame(var_1, var_2, var_3, response)
#declare var_3 and response_variable as factors
f$var_3 = as.factor(f$var_3)
f$response = as.factor(f$response)
# run random forest on all the data except the first observation
model<-randomForest(response ~., data = f[-1,] , mtry=2, ntree=100)
model<-as_classifier(model, labels = NULL)
#run the "lime" procedure on the first observation
explainer <- lime(f[-1,], model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(f[-1, ], explainer, n_labels = 1, n_features = 4)
#visualize the results - here is the error:
plot_features(explanation, ncol = 1)
Error in if (nrow(explanation) == 0) stop("No explanations to plot", call. = FALSE) :
argument is of length zero
Can someone please show me what I am doing wrong? Is it because this procedure is not meant to be run on a single observation?
Thanks
UPDATE: If I change this line of code:
model<-randomForest(response ~., data = f[-1,] , mtry=2, ntree=100)
to
model<-randomForest(response ~., data = f , mtry=2, ntree=100)
the code now seems to run (this is not a big problem, I can just write f = f[-1,]
and f_new = f[1,]
prior to running this step), but the visual plot is not fully showing up. Is this a problem with my graphics console? (note: the tutorial from the website works and runs perfectly)
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C LC_TIME=English_Canada.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] randomForest_4.6-14 lime_0.5.1 MASS_7.3-53
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 lubridate_1.7.9 lattice_0.20-41 class_7.3-17 assertthat_0.2.1
[6] glmnet_4.0-2 digest_0.6.25 ipred_0.9-9 foreach_1.5.1 mime_0.9
[11] R6_2.4.1 plyr_1.8.6 stats4_4.0.2 ggplot2_3.3.2 pillar_1.4.6
[16] rlang_0.4.7 caret_6.0-86 rstudioapi_0.11 data.table_1.12.8 rpart_4.1-15
[21] Matrix_1.2-18 shinythemes_1.1.2 labeling_0.3 splines_4.0.2 gower_0.2.2
[26] stringr_1.4.0 htmlwidgets_1.5.2 munsell_0.5.0 tinytex_0.26 shiny_1.5.0
[31] compiler_4.0.2 httpuv_1.5.4 xfun_0.15 pkgconfig_2.0.3 shape_1.4.5
[36] htmltools_0.5.0 nnet_7.3-14 tidyselect_1.1.0 tibble_3.0.3 prodlim_2019.11.13
[41] codetools_0.2-16 crayon_1.3.4 dplyr_1.0.2 withr_2.3.0 later_1.1.0.1
[46] recipes_0.1.13 ModelMetrics_1.2.2.2 grid_4.0.2 nlme_3.1-149 xtable_1.8-4
[51] gtable_0.3.0 lifecycle_0.2.0 magrittr_1.5 pROC_1.16.2 scales_1.1.1
[56] stringi_1.4.6 farver_2.0.3 reshape2_1.4.4 promises_1.1.1 timeDate_3043.102
[61] ellipsis_0.3.1 generics_0.0.2 vctrs_0.3.2 xgboost_1.1.1.1 lava_1.6.8
[66] iterators_1.0.13 tools_4.0.2 glue_1.4.1 purrr_0.3.4 fastmap_1.0.1
Upvotes: 1
Views: 297
Reputation: 5945
I might have got it to work. As per the original code I was using, here is the plot:
#load libraries
library(MASS)
library(lime)
library(randomForest)
#create data
var_1<- rnorm(100,1,4)
var_2 <-rnorm(10,10,5)
var_3<- c("0","2", "4")
var_3 <- sample(var_3, 100, replace=TRUE, prob=c(0.3, 0.6, 0.1))
response<- c("1","0")
response <- sample(response, 100, replace=TRUE, prob=c(0.3, 0.7))
#put them into a data frame called "f"
f <- data.frame(var_1, var_2, var_3, response)
#declare var_3 and response_variable as factors
f$var_3 = as.factor(f$var_3)
f$response = as.factor(f$response)
# run random forest on all the data except the first observation
model<-randomForest(response ~., data = f , mtry=2, ntree=100)
model<-as_classifier(model, labels = NULL)
#run the "lime" procedure on the first observation
explainer <- lime(f[-1,], model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(f[-1, ], explainer, n_labels = 1, n_features = 4)
#visualize the results - here is the error:
plot_features(explanation, ncol = 1)
I change the code (see below):
#load libraries
library(MASS)
library(lime)
library(randomForest)
#create data
var_1<- rnorm(100,1,4)
var_2 <-rnorm(10,10,5)
var_3<- c("0","2", "4")
var_3 <- sample(var_3, 100, replace=TRUE, prob=c(0.3, 0.6, 0.1))
response<- c("1","0")
response <- sample(response, 100, replace=TRUE, prob=c(0.3, 0.7))
#put them into a data frame called "f"
f <- data.frame(var_1, var_2, var_3, response)
#declare var_3 and response_variable as factors
f$var_3 = as.factor(f$var_3)
f$response = as.factor(f$response)
# run random forest on all the data except the first observation
model<-randomForest(response ~., data = f , mtry=2, ntree=100)
model<-as_classifier(model, labels = NULL)
#run the "lime" procedure on the first observation
explainer <- lime(f[-1,], model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(f[-1, ], explainer, n_labels = 1, n_features = 4)
#visualize the results - here is the error:
plot_features(explanation, case =1:4, ncol = 1)
I don't understand what changed - but at least the graphics now show up. Suppose I am interested in only the first observation. I am still confused whether these lines should be:
explainer <- lime(f[-1,], model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(f[-1, ], explainer, n_labels = 1, n_features = 4)
or
explainer <- lime(f, model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(f, explainer, n_labels = 1, n_features = 4)
I am also not sure what is the difference between "probability" and "explanation fit". I assume "probability" is the probability generated by the random forest model, and "explanation fit" measures the "explanatory power" of the LIME model.
(If someone knows about this, could they please comment below? thanks)
Upvotes: 1