Reputation: 41
I'm having some issues using SMOTE, from the smotefamily
package, my code keeps getting this error:
Error in get.knnx(data, query, k, algorithm) : Data non-numeric
I'm new at R Language, I'm trying to make the following work:
dados_treino_bal <- SMOTE(X = dados_treino, target = dados_treino$Inadimplente, K = ~ ., dup_size = 0)
SMOTE(X, target, K = 5, dup_size = 0)
Considering my dataset is correct with the proposed factors not all data is numeric, but that's how it's supposed to be right ?
For K
I'm considering ~ + . to indicate I want all predictors variables
Upvotes: 4
Views: 3141
Reputation: 367
you can use this function to check and transform all character and factor columns to numeric.
dados_treino[] <- lapply(dados_treino, function(x) {
if (is.character(x)) {
as.numeric(as.character(x)) # Convert character to numeric, NAs will be preserved
} else if (is.factor(x)) {
as.numeric(as.character(x)) # Convert factor to numeric, NAs will be preserved
} else {
x # Keep numeric columns as they are
}
})
In addition : SMOTE function is very sensative to NAs , use str() function to check your data berofe applying it !
dados_treino<- na.omit(dados_treino)
one more remarque, that you are using all dados_treino in X , you need to drop the class variable from the selection .
here a reformulation that may help you.
dados_treino_bal <- SMOTE(dados_treino%>%select(-c(Inadimplente)), target = dados_treino$Inadimplente)
Upvotes: 0