tzirtzi
tzirtzi

Reputation: 125

Bivariate partial dependence with randomForest in R

I have a dataset with a binary dependent variable and a number of predictors, including participant. I am trying to examine the idiosyncratic effects of different predictors for different participants. In order to do that, I'm trying to look at the effect of interactions between participant id and the other predictors on the dependent variable. I'm using randomForest in R. I can fit the forest successfully, and can produce partial dependence plots for individual variables. What I need, however, are partial dependence plots for pairs of variables - participant + others. Is this possible?

For reference, my code:

data_sample<-data_raw[sample(1:nrow(data_raw),500,replace=F),];
test_rf<-randomForest(perceptually.rhotic~vowel+speaker+modified_clip_start+function_word+year_of_birth+gender+fathers_job_type+prepausal,data=data_sample,ntree=500,mtry=3);
partialPlot(test_rf,pred.dat=data_sample,x.var="speaker");

??? partialPlot(test_rf,pred.dat=data_sample,x.var=c("speaker","vowel"));

Thanks very much in advance for any advice anyone can offer!

Upvotes: 0

Views: 1480

Answers (1)

Stephen Milborrow
Stephen Milborrow

Reputation: 1016

The plotmo R package will plot partial dependencies for all variables and pairs of variables (bivariate dependencies) for "any" model. For example:

library(randomForest)
data(trees)
mod <- randomForest(Volume~., data=trees)
library(plotmo)
plotmo(mod, pmethod="partdep") # plot partial dependencies

which gives

plot

You can specify exactly which variable and variable pairs get plotted using plotmo's all1, all2, degree1, and degree2 arguments. Additional examples are in the vignette for the plotmo package.

Upvotes: 4

Related Questions