Jim421616
Jim421616

Reputation: 1536

When using `caret` for kNN, I get "Error in terms.formula(formula, data = data) : '.' in formula and no 'data' argument"

I'm trying to use caret to find the optimum k for a kNN analysis of some data:

library(tidyverse)
library(caret)

# Read and clean up the data
ugriz  <- read.table("QSOs_1st_50k.dat-mags.dat")
ugriz[ugriz == -999] <- NA
fields <- c('name', 'z','delta_z','NED_class','SDSS_class','no_radio','radio_max','no_UV', 'UV_min',
    'u', 'g', 'r', 'i', 'z_mag', 'I', 'J', 'H', 'K', 'W1', 'SPIT_5',
    'W2', 'SPIT_8', 'W3', 'W4', 'NUV', 'FUV')
names(ugriz) <- fields

sample_n(ugriz, 5)
attach(ugriz)

# Randomly split the dataset into training and testing subsets
set.seed(123) # for reproducible randomness in producing training and test sets
training.samples <- z %>% createDataPartition(p=0.5, list = FALSE)
train.data <- ugriz[training.samples]
test.data <- ugriz[-training.samples]

model <- train(z~., data = train.data, method = "knn",
    trControl = trainControl("cv", number = 10),
    preProcess = c("center","scale"),
    tuneLength = 10)

My aim is to test the predictions of z against the magnitude values of the columns 'u', 'g', 'r', 'i', 'z_mag', 'I', 'J', 'H', 'K', 'W1', 'SPIT_5', 'W2', 'SPIT_8', 'W3', 'W4', 'NUV', 'FUV', but I keep coming up against the error

Error in terms.formula(formula, data = data) : 
  '.' in formula and no 'data' argument

If I change the formula to something like

model <- train(z~u, data = train.data, method = "knn",
    trControl = trainControl("cv", number = 10),
    preProcess = c("center","scale"),
    tuneLength = 10) # Gives error

I get

Error in eval(predvars, data, env) : 
  invalid 'envir' argument of type 'character'

I'm using RStudio v 1.3.959, with R v 4.0.0 Googling the error gives me links to the same error in neuralnet, but nothing in caret. Here it looks like there's a bug in some earlier version of R.

What's causing the error?

Upvotes: 1

Views: 110

Answers (1)

UseR10085
UseR10085

Reputation: 8136

You have committed a mistake in data partitioning. You have missed a "," after training.samples. As you have not provided any data, I am using iris data

library(caret)
library(tidyverse)
# Randomly split the dataset into training and testing subsets
set.seed(123) # for reproducible randomness in producing training and test sets
training.samples <- createDataPartition(iris$Species ,p=0.5, list = FALSE)
train.data <- iris[training.samples,]
test.data <- iris[-training.samples, ]

train(Species~Sepal.Length, data = train.data, method = "knn",
      trControl = trainControl("cv", number = 10),
      preProcess = c("center","scale"),
      tuneLength = 10)

It does not give me any error.

Upvotes: 1

Related Questions