Reputation: 356
I am trying to run a least discriminant analysis (lda()
) on a data.frame I created by dividing several variables by an additional scaling variable (not shown here) in R using the MASS
package. Below is a sample dataset and a sample version of the code I am using that reproduces the error.
class Var1 Var2 Var3 Var4
2 0.732459522 0.973014649 0.612952968 0.127216654
3 0.76692254 0.990230286 0.629448709 0.104675506
2 0.847487002 1.021663778 0.649046794 0.187175043
3 0.823583181 1.050274223 0.673674589 0.170018282
1 0.796279894 1.058458813 0.583702391 0.222320638
2 0.925681255 1.009909166 0.636663914 0.205615194
2 0.627334465 1.074702886 0.59762309 0.23344652
3 0.980376124 1.011447261 0.646770237 0.232215863
3 0.79342723 1.048826291 0.750234742 0.248826291
1 0.960655738 1.042622951 0.6 0.262295082
2 0.963788301 1.005571031 0.590529248 0.233983287
1 1.013157895 1.049342105 0.657894737 0.223684211
2 1.211538462 1.060897436 0.733974359 0.288461538
3 1.25083612 1.023411371 0.759197324 0.311036789
3 0.959196485 1.009416196 0.635907094 0.12868801
1 0.823681936 1.005185825 0.590319793 0.219533276
2 0.777508091 0.998381877 0.624595469 0.165048544
3 0.749114103 0.985825656 0.585400425 0.133947555
1 0.816999133 1.036426713 0.604509974 0.197745013
data<-read.csv("data.csv",header=TRUE)
data_train<-na.omit(data)
scores_train<-data_train[-c(1)]
lda_train<-lda(data_train$class~scores_train,prior = c(1,1,1)/3,CV=TRUE)
scores_test<-data[-c(1)]
lda_test<-predict(lda_train,as.data.frame(scores_test),prior = c(1,1,1)/3)
lda_train<-lda(data_train$class~as.matrix(scores_train),prior = c(1,1,1)/3,CV=TRUE)
class(scores_train)
class(scores_test)
When I try to perform the lda using the dataset, I get the following error message.
Error in model.frame.default(formula = data_train$class ~ scores_train) :
invalid type (list) for variable 'scores_train'
I am able to coerce the data into working by coercing it into a matrix format using as.matrix
. Notably, trying to do something similar using as.data.frame()
and data.frame()
does not work. However then when I try to apply the resulting discriminant function to the total dataset the I get the following message...
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "list"
However, when I check the class of the objects of using class()
, it says both objects are in a data.frame format. I checked the dataset to see if there were any incomplete rows or columns that could cause it to treat them as a series of lists instead of a single data.frame, but there are no missing values. Similarly, it does not appear to be due to the names of any variables.
I am not sure why R is treating the object as a list instead of a data.frame (and thereby causing the least discriminant analysis to fail), especially as it recognizes the objects are of the class data.frame.
Upvotes: 1
Views: 820
Reputation: 46898
for lda, you have to provide the formula, so the below works if you provide a dataframe:
lda_train<-lda(class ~ .,data=data_train,prior = c(1,1,1)/3,CV=TRUE)
else if you don't provide the formula, do:
lda(grouping=data_train$class,x=data_train[,-1],prior = c(1,1,1)/3, CV=TRUE)
When you use CV=TRUE, it uses leave-one-out cross validation to give you the posterior, but unfortunately it is not able to retain the model, and you can see it:
class(lda_train)
[1] "list"
To predict, you need to train with CV=FALSE. You provide a data.frame or matrix that has the same column has as that used for the training, and in your case it will be:
lda_train<-lda(class ~ .,data=data_train,prior = c(1,1,1)/3)
data_test=data.frame(Var1=rnorm(10),Var2=rnorm(10),
Var3=rnorm(10),Var4=rnorm(10))
predict(lda_train,data_test)
For lda
from MASS
, there is no hyper-parameter to be obtained from training, so maybe you want to elaborate on why you need the cross-validation?
In case you would want to explore it, here's how you can run cross-validation for lda (note using lda2):
data_train$class =factor(data$class)
lda_train = train(class ~ .,data=data_train,method="lda2",
trControl = trainControl(method = "cv"))
predict(lda_train,data_test)
Upvotes: 3
Reputation: 7626
The formula
argument is looking for a structured formula declaring how the variables relate. Each variable named must be a vector. You can pass all the names in the same dataframe whilst declaring the data argument:
lda(class ~ Var1 + Var2 + Var3 + Var4,
data = data, prior = c(1,1,1)/3, CV=TRUE)
Or pass the columns separately:
lda(data$class ~ scores_train$Var1 +
scores_train$Var2 +
scores_train$Var3 +
scores_train$Var4,
prior = c(1,1,1)/3, CV=TRUE)
For the problem of predict
not accepting it as an object, you need to change CV
to FALSE
, otherwise it only returns a list (not a lda
object which predict
needs):
model <- lda(data$class ~ scores_train$Var1 +
scores_train$Var2 +
scores_train$Var3 +
scores_train$Var4,
prior = c(1,1,1)/3, CV=FALSE)
predict(model)
Upvotes: 1