geds133
geds133

Reputation: 1485

R: upSample in Caret is removing target variable completely

I am trying to upsample an imbalanced dataset in R using the upSample function in Caret. However upon applying the function it completely removes the target variable C_flag from the dataset. Here is my code:

set.seed(100)
'%ni%' <- Negate('%in%')
up_train <- upSample(x = train[, colnames(train) %ni% "C_flag"], #all predictor variables
                     y = train$C_flag) #target variable

Here is the amount of each category of C_flag in the train set. 0 = 100193, 1=29651.

I test to see if C_flag is there with this result:

print(up_train$C_flag)
NULL

Does anyone know why this function is removing this variable instead of upsampling?

Upvotes: 3

Views: 2506

Answers (1)

Alexis
Alexis

Reputation: 2266

First thing that comes to my mind is if up_train$C_flagis a factor or not. Anyway, I tried this sample dataset:

library(tidyverse)
library(caret)
train <- data.frame(x1 = c(2,3,4,2,3,3,3,8),
                      x2 = c(1,2,1,2,4,1,1,4),
                      C_flag = c("A","B","B","A","A","A","A","A"))
train$C_flag <- as.factor(train$C_flag)
'%ni%' <- Negate('%in%')
up_train <- upSample(x = train[,colnames(train) %ni% "C_flag"],
                      y = train$C_flag)
up_train$C_flag

And it returned me NULL. Why?, because the target column was renamed "Class". So if you want to see the target with the name C_flag add the yname name you want:

up_train <- upSample(x = train[,colnames(train) %ni% "C_flag"],
                     y = train$C_flag,
                     yname = "C_flag")
print(up_train$C_flag)

[1] A A A A A A B B B B B B
Levels: A B

Upvotes: 3

Related Questions