Reputation: 2070
I am applying SMOTE (DMwR package) given that I have a class imbalance problem. However, I have three class outcomes instead of two.
The function correctly oversamples the minority class but I am not following the behavior for the majority/ middle class (i.e., all categories contain different sample sizes).
Let's say:
library(DMwR)
set.seed(1234)
train = data.frame(group=as.factor(rep(c(1,2,3),c(35,110,220))),
score=rnorm(365,100))
train_resample <- SMOTE(group ~ ., train, perc.over = 400, perc.under=200)
table(train_resample$group)
# 1 2 3
# 175 104 176
The minority class makes sense, 35+(35*4) = 175. Also, the remaining sample is clear, 140*200/100 = 280. However, I am not sure how this sample is distributed over the remaining classes. It retains the sample size order but it might be random.
Any ideas?
Upvotes: 6
Views: 3368
Reputation: 11
you can try SmoteClassif() function in UBL package. The function allows you to specify the percentage by which you want to undersample or oversample each of the class.
Upvotes: 1