Reputation: 929
For label encoding I am using model.matrix
from library onehot
in R.
The data set is available here.
I have renamed the file as train.csv
The feature to be encoded is Education
. It has got two levels, Graduate
and Not Graduate
. However on executing the code,
library(onehot)
data <- read_csv("train.csv")
set.seed(1234)
datashuffled <- data[sample(1:nrow(data)), ]
datashuffled_Loan_StatusRemoved <- datashuffled %>%
select(-starts_with("Loan_Status"))
features <- datashuffled_Loan_StatusRemoved
sum(is.na(features$Education))
features$Education[features$Education=="Not Graduate"] <- "NotGraduate"
E <- model.matrix(~Education-1,head(features))
I get an error as
Error in contrasts<-(tmp, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels.
Upvotes: 0
Views: 218
Reputation: 929
Sorry it was a typo. I should have used the complete dataset for model.matrix
. The fix is to replace
E <- model.matrix(~Education-1,head(features))
to
E <- model.matrix(~Education-1,features)
Upvotes: 1