Reputation: 1536
I'm trying to use caret
to find the optimum k
for a kNN analysis of some data:
library(tidyverse)
library(caret)
# Read and clean up the data
ugriz <- read.table("QSOs_1st_50k.dat-mags.dat")
ugriz[ugriz == -999] <- NA
fields <- c('name', 'z','delta_z','NED_class','SDSS_class','no_radio','radio_max','no_UV', 'UV_min',
'u', 'g', 'r', 'i', 'z_mag', 'I', 'J', 'H', 'K', 'W1', 'SPIT_5',
'W2', 'SPIT_8', 'W3', 'W4', 'NUV', 'FUV')
names(ugriz) <- fields
sample_n(ugriz, 5)
attach(ugriz)
# Randomly split the dataset into training and testing subsets
set.seed(123) # for reproducible randomness in producing training and test sets
training.samples <- z %>% createDataPartition(p=0.5, list = FALSE)
train.data <- ugriz[training.samples]
test.data <- ugriz[-training.samples]
model <- train(z~., data = train.data, method = "knn",
trControl = trainControl("cv", number = 10),
preProcess = c("center","scale"),
tuneLength = 10)
My aim is to test the predictions of z
against the magnitude values of the columns 'u', 'g', 'r', 'i', 'z_mag', 'I', 'J', 'H', 'K', 'W1', 'SPIT_5', 'W2', 'SPIT_8', 'W3', 'W4', 'NUV', 'FUV', but I keep coming up against the error
Error in terms.formula(formula, data = data) :
'.' in formula and no 'data' argument
If I change the formula to something like
model <- train(z~u, data = train.data, method = "knn",
trControl = trainControl("cv", number = 10),
preProcess = c("center","scale"),
tuneLength = 10) # Gives error
I get
Error in eval(predvars, data, env) :
invalid 'envir' argument of type 'character'
I'm using RStudio v 1.3.959, with R v 4.0.0
Googling the error gives me links to the same error in neuralnet
, but nothing in caret
. Here it looks like there's a bug in some earlier version of R.
What's causing the error?
Upvotes: 1
Views: 110
Reputation: 8136
You have committed a mistake in data partitioning. You have missed a "," after training.samples
. As you have not provided any data, I am using iris
data
library(caret)
library(tidyverse)
# Randomly split the dataset into training and testing subsets
set.seed(123) # for reproducible randomness in producing training and test sets
training.samples <- createDataPartition(iris$Species ,p=0.5, list = FALSE)
train.data <- iris[training.samples,]
test.data <- iris[-training.samples, ]
train(Species~Sepal.Length, data = train.data, method = "knn",
trControl = trainControl("cv", number = 10),
preProcess = c("center","scale"),
tuneLength = 10)
It does not give me any error.
Upvotes: 1