Pawel
Pawel

Reputation: 657

Incorporating observation weights in the randomForest package

How can I use the R randomForest package with observation weights? I know that there is no such option in this package. I have 2 questions:

  1. Are there any solutions to this problem using randomForest package? At this moment I'm drawing samples from data with weights as the probability so I can at least simulate it:

    m = dim(data)[1]
    sample(data, m, replace=TRUE, prob=weights)
    

    It works are there other (better) solutions?

  2. Are there any alternatives to the randomForest package. I found the party package (cforest) but it's terrible in terms of memory management (or I cannot use it the way I use randomForest package). I have around 200k observations and 30-40 variables.

EDIT:

Sorry for not clarifying details. I'm using the randomForest package for regression problem (not classification). It is a time series and every observation has its weight. Later on this weight is used to determine the model performance across test observations. The y variable is continuous.

Upvotes: 19

Views: 5792

Answers (2)

Ooona
Ooona

Reputation: 103

I was looking for the same option as you Pawel in the Random Forest. And I figured out the package "ranger" in R incorporates it in the function "ranger" (through the parameter "case.weights").

The package released in june 2016 so it is very young.

Best,

Upvotes: 3

IRTFM
IRTFM

Reputation: 263331

randomForest does have a "classwt" parameter that should allow you to account for differential sampling probabilities or even for differential costs. Admittedly it is ignored with regression Perhaps you should explain why you need to use weighting and what sort of y variable you are using.

Upvotes: 2

Related Questions