AriesV
AriesV

Reputation: 31

Minbucket and weights in rpart

A couple questions for the rpart and party experts.

1) I am trying to understand the difference of the control parameter "minbucket" in rpart and party. Is it correct that minbucket in rpart is unweighted (even if weights are provided to fit the tree)?

2) Can anyone briefly describe how the weights are used in the rpart algorithm? I tried to download and review the source code, but I couldn't make much sense of it being a newbie. rpart calls a C function (C_rpart), which seems to be the main part of rpart, but I couldn't find more information about it.

Thanks so much in advance.

Upvotes: 3

Views: 1179

Answers (1)

Craig
Craig

Reputation: 4682

The weights parameter in rpart (and in most other machine learning algorithms) can be considered to be exactly equivalent to duplicating those training items that many times. A weight of 5 is the same as having that line repeated 5 times. You can explicitly create this using some simple code, provided that your data set is small enough:

data[rep(1:nrow(data),times=data$weights),] 

Upvotes: 1

Related Questions