Natthy Rackham
Natthy Rackham

Reputation: 41

smotefamily::SMOTE -> Error in get.knnx(data, query, k, algorithm) : Data non-numeric

I'm having some issues using SMOTE, from the smotefamily package, my code keeps getting this error:

Error in get.knnx(data, query, k, algorithm) : Data non-numeric

I'm new at R Language, I'm trying to make the following work:

dados_treino_bal <- SMOTE(X = dados_treino, target = dados_treino$Inadimplente, K = ~ ., dup_size = 0)

SMOTE(X, target, K = 5, dup_size = 0)

Considering my dataset is correct with the proposed factors not all data is numeric, but that's how it's supposed to be right ?

For K I'm considering ~ + . to indicate I want all predictors variables

Upvotes: 4

Views: 3141

Answers (1)

phœnix
phœnix

Reputation: 367

you can use this function to check and transform all character and factor columns to numeric.

dados_treino[] <- lapply(dados_treino, function(x) {
  if (is.character(x)) {
    as.numeric(as.character(x))  # Convert character to numeric, NAs will be preserved
  } else if (is.factor(x)) {
    as.numeric(as.character(x))  # Convert factor to numeric, NAs will be preserved
  } else {
    x  # Keep numeric columns as they are
  }
})

In addition : SMOTE function is very sensative to NAs , use str() function to check your data berofe applying it !

dados_treino<- na.omit(dados_treino)

one more remarque, that you are using all dados_treino in X , you need to drop the class variable from the selection .

here a reformulation that may help you.

dados_treino_bal <- SMOTE(dados_treino%>%select(-c(Inadimplente)), target = dados_treino$Inadimplente)

Upvotes: 0

Related Questions