stats_noob
stats_noob

Reputation: 5945

R: "argument is of length 0" (empty plot)

I am using the R programming language. I am trying to follow this tutorial over here: https://cran.r-project.org/web/packages/lime/vignettes/Understanding_lime.html

I tried to create my own data to replicate this tutorial with:

#load libraries
library(MASS)
library(lime)
library(randomForest)

#create data
var_1<- rnorm(100,1,4)
var_2 <-rnorm(10,10,5)
var_3<- c("0","2", "4")
var_3 <- sample(var_3, 100, replace=TRUE, prob=c(0.3, 0.6, 0.1))

response<- c("1","0")
response <- sample(response, 100, replace=TRUE, prob=c(0.3, 0.7))

#put them into a data frame called "f"
f <- data.frame(var_1, var_2, var_3, response)

#declare var_3 and response_variable as factors
f$var_3 = as.factor(f$var_3)
f$response = as.factor(f$response)

# run random forest on all the data except the first observation
model<-randomForest(response ~., data = f[-1,] , mtry=2, ntree=100)
model<-as_classifier(model, labels = NULL)

#run the "lime" procedure on the first observation
explainer <- lime(f[-1,], model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(f[-1, ], explainer, n_labels = 1, n_features = 4)
    
#visualize the results - here is the error:
plot_features(explanation, ncol = 1)

Error in if (nrow(explanation) == 0) stop("No explanations to plot", call. = FALSE) : 
  argument is of length zero

Can someone please show me what I am doing wrong? Is it because this procedure is not meant to be run on a single observation?

Thanks

UPDATE: If I change this line of code:

model<-randomForest(response ~., data = f[-1,] , mtry=2, ntree=100)

to

model<-randomForest(response ~., data = f , mtry=2, ntree=100)

the code now seems to run (this is not a big problem, I can just write f = f[-1,] and f_new = f[1,] prior to running this step), but the visual plot is not fully showing up. Is this a problem with my graphics console? (note: the tutorial from the website works and runs perfectly)

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] randomForest_4.6-14 lime_0.5.1          MASS_7.3-53        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5           lubridate_1.7.9      lattice_0.20-41      class_7.3-17         assertthat_0.2.1    
 [6] glmnet_4.0-2         digest_0.6.25        ipred_0.9-9          foreach_1.5.1        mime_0.9            
[11] R6_2.4.1             plyr_1.8.6           stats4_4.0.2         ggplot2_3.3.2        pillar_1.4.6        
[16] rlang_0.4.7          caret_6.0-86         rstudioapi_0.11      data.table_1.12.8    rpart_4.1-15        
[21] Matrix_1.2-18        shinythemes_1.1.2    labeling_0.3         splines_4.0.2        gower_0.2.2         
[26] stringr_1.4.0        htmlwidgets_1.5.2    munsell_0.5.0        tinytex_0.26         shiny_1.5.0         
[31] compiler_4.0.2       httpuv_1.5.4         xfun_0.15            pkgconfig_2.0.3      shape_1.4.5         
[36] htmltools_0.5.0      nnet_7.3-14          tidyselect_1.1.0     tibble_3.0.3         prodlim_2019.11.13  
[41] codetools_0.2-16     crayon_1.3.4         dplyr_1.0.2          withr_2.3.0          later_1.1.0.1       
[46] recipes_0.1.13       ModelMetrics_1.2.2.2 grid_4.0.2           nlme_3.1-149         xtable_1.8-4        
[51] gtable_0.3.0         lifecycle_0.2.0      magrittr_1.5         pROC_1.16.2          scales_1.1.1        
[56] stringi_1.4.6        farver_2.0.3         reshape2_1.4.4       promises_1.1.1       timeDate_3043.102   
[61] ellipsis_0.3.1       generics_0.0.2       vctrs_0.3.2          xgboost_1.1.1.1      lava_1.6.8          
[66] iterators_1.0.13     tools_4.0.2          glue_1.4.1           purrr_0.3.4          fastmap_1.0.1  

Upvotes: 1

Views: 297

Answers (1)

stats_noob
stats_noob

Reputation: 5945

I might have got it to work. As per the original code I was using, here is the plot:

#load libraries
library(MASS)
library(lime)
library(randomForest)

#create data
var_1<- rnorm(100,1,4)
var_2 <-rnorm(10,10,5)
var_3<- c("0","2", "4")
var_3 <- sample(var_3, 100, replace=TRUE, prob=c(0.3, 0.6, 0.1))

response<- c("1","0")
response <- sample(response, 100, replace=TRUE, prob=c(0.3, 0.7))

#put them into a data frame called "f"
f <- data.frame(var_1, var_2, var_3, response)

#declare var_3 and response_variable as factors
f$var_3 = as.factor(f$var_3)
f$response = as.factor(f$response)

# run random forest on all the data except the first observation
model<-randomForest(response ~., data = f , mtry=2, ntree=100)
model<-as_classifier(model, labels = NULL)

#run the "lime" procedure on the first observation
explainer <- lime(f[-1,], model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(f[-1, ], explainer, n_labels = 1, n_features = 4)

#visualize the results - here is the error:
plot_features(explanation, ncol = 1)

enter image description here I change the code (see below):

#load libraries
library(MASS)
library(lime)
library(randomForest)

#create data
var_1<- rnorm(100,1,4)
var_2 <-rnorm(10,10,5)
var_3<- c("0","2", "4")
var_3 <- sample(var_3, 100, replace=TRUE, prob=c(0.3, 0.6, 0.1))

response<- c("1","0")
response <- sample(response, 100, replace=TRUE, prob=c(0.3, 0.7))

#put them into a data frame called "f"
f <- data.frame(var_1, var_2, var_3, response)

#declare var_3 and response_variable as factors
f$var_3 = as.factor(f$var_3)
f$response = as.factor(f$response)

# run random forest on all the data except the first observation
model<-randomForest(response ~., data = f , mtry=2, ntree=100)
model<-as_classifier(model, labels = NULL)

#run the "lime" procedure on the first observation
explainer <- lime(f[-1,], model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(f[-1, ], explainer, n_labels = 1, n_features = 4)

#visualize the results - here is the error:
plot_features(explanation, case =1:4, ncol = 1)

enter image description here

I don't understand what changed - but at least the graphics now show up. Suppose I am interested in only the first observation. I am still confused whether these lines should be:

explainer <- lime(f[-1,], model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(f[-1, ], explainer, n_labels = 1, n_features = 4)

or

explainer <- lime(f, model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(f, explainer, n_labels = 1, n_features = 4)

I am also not sure what is the difference between "probability" and "explanation fit". I assume "probability" is the probability generated by the random forest model, and "explanation fit" measures the "explanatory power" of the LIME model.

(If someone knows about this, could they please comment below? thanks)

Upvotes: 1

Related Questions