Reputation: 1485
I am trying to upsample an imbalanced dataset in R
using the upSample
function in Caret. However upon applying the function it completely removes the target variable C_flag
from the dataset. Here is my code:
set.seed(100)
'%ni%' <- Negate('%in%')
up_train <- upSample(x = train[, colnames(train) %ni% "C_flag"], #all predictor variables
y = train$C_flag) #target variable
Here is the amount of each category of C_flag
in the train set.
0 = 100193
, 1=29651
.
I test to see if C_flag
is there with this result:
print(up_train$C_flag)
NULL
Does anyone know why this function is removing this variable instead of upsampling?
Upvotes: 3
Views: 2506
Reputation: 2266
First thing that comes to my mind is if up_train$C_flag
is a factor or not. Anyway, I tried this sample dataset:
library(tidyverse)
library(caret)
train <- data.frame(x1 = c(2,3,4,2,3,3,3,8),
x2 = c(1,2,1,2,4,1,1,4),
C_flag = c("A","B","B","A","A","A","A","A"))
train$C_flag <- as.factor(train$C_flag)
'%ni%' <- Negate('%in%')
up_train <- upSample(x = train[,colnames(train) %ni% "C_flag"],
y = train$C_flag)
up_train$C_flag
And it returned me NULL
. Why?, because the target column was renamed "Class". So if you want to see the target with the name C_flag add the yname
name you want:
up_train <- upSample(x = train[,colnames(train) %ni% "C_flag"],
y = train$C_flag,
yname = "C_flag")
print(up_train$C_flag)
[1] A A A A A A B B B B B B
Levels: A B
Upvotes: 3