Use logistic regression on data set with repeated K fold using R

Question

I am trying to predict if water are safe to drink or not. The data set is composed of the one here: https://www.kaggle.com/adityakadiwal/water-potability?select=water_potability.csv. Assume I take the dataframe to be composed of Ph, Hardness, Solids, Chloramines and Potability.

I'd like to run logistic regression on 10 k fold (for example, I wish to try more choices). Disregarding the computational power needed, I'd also then like to conduct this with different randomized 10 k fold, 5 more times and then choose the best model.

I have come across the k fold function, and glm function , but I don't know how to combine it to repeat this process 5 randomized times. Later on, I'd also like to create something similar with KNN. I'd appreciate any help on this matter.

some code:

df <- read_csv("water_potability.csv")

train_model <- trainControl(method = "repeatedcv",  
                              number = 10, repeats = 5)

model <- train(Potability~., data = df, method = "regLogistic",
               trControl = train_model )

However, I'd prefer to use non regularized logistic.

Use logistic regression on data set with repeated K fold using R

Answers (1)

Related Questions