timfaber
timfaber

Reputation: 2070

SMOTE - multiclass

I am applying SMOTE (DMwR package) given that I have a class imbalance problem. However, I have three class outcomes instead of two.

The function correctly oversamples the minority class but I am not following the behavior for the majority/ middle class (i.e., all categories contain different sample sizes).

Let's say:

library(DMwR)

set.seed(1234)

train = data.frame(group=as.factor(rep(c(1,2,3),c(35,110,220))),
            score=rnorm(365,100))

train_resample <- SMOTE(group ~ ., train, perc.over = 400, perc.under=200)

table(train_resample$group)

#  1   2   3 
# 175  104 176

The minority class makes sense, 35+(35*4) = 175. Also, the remaining sample is clear, 140*200/100 = 280. However, I am not sure how this sample is distributed over the remaining classes. It retains the sample size order but it might be random.

Any ideas?

Upvotes: 6

Views: 3368

Answers (1)

Suma S N
Suma S N

Reputation: 11

you can try SmoteClassif() function in UBL package. The function allows you to specify the percentage by which you want to undersample or oversample each of the class.

Upvotes: 1

Related Questions