user2352714
user2352714

Reputation: 356

Why is R treating this data.frame object as a list?

I am trying to run a least discriminant analysis (lda()) on a data.frame I created by dividing several variables by an additional scaling variable (not shown here) in R using the MASS package. Below is a sample dataset and a sample version of the code I am using that reproduces the error.

class   Var1    Var2    Var3    Var4
2   0.732459522 0.973014649 0.612952968 0.127216654
3   0.76692254  0.990230286 0.629448709 0.104675506
2   0.847487002 1.021663778 0.649046794 0.187175043
3   0.823583181 1.050274223 0.673674589 0.170018282
1   0.796279894 1.058458813 0.583702391 0.222320638
2   0.925681255 1.009909166 0.636663914 0.205615194
2   0.627334465 1.074702886 0.59762309  0.23344652
3   0.980376124 1.011447261 0.646770237 0.232215863
3   0.79342723  1.048826291 0.750234742 0.248826291
1   0.960655738 1.042622951 0.6 0.262295082
2   0.963788301 1.005571031 0.590529248 0.233983287
1   1.013157895 1.049342105 0.657894737 0.223684211
2   1.211538462 1.060897436 0.733974359 0.288461538
3   1.25083612  1.023411371 0.759197324 0.311036789
3   0.959196485 1.009416196 0.635907094 0.12868801
1   0.823681936 1.005185825 0.590319793 0.219533276
2   0.777508091 0.998381877 0.624595469 0.165048544
3   0.749114103 0.985825656 0.585400425 0.133947555
1   0.816999133 1.036426713 0.604509974 0.197745013
data<-read.csv("data.csv",header=TRUE)
data_train<-na.omit(data)
scores_train<-data_train[-c(1)]
lda_train<-lda(data_train$class~scores_train,prior = c(1,1,1)/3,CV=TRUE)
scores_test<-data[-c(1)]
lda_test<-predict(lda_train,as.data.frame(scores_test),prior = c(1,1,1)/3)

lda_train<-lda(data_train$class~as.matrix(scores_train),prior = c(1,1,1)/3,CV=TRUE)
class(scores_train)
class(scores_test)

When I try to perform the lda using the dataset, I get the following error message.

Error in model.frame.default(formula = data_train$class ~ scores_train) : 
  invalid type (list) for variable 'scores_train'

I am able to coerce the data into working by coercing it into a matrix format using as.matrix. Notably, trying to do something similar using as.data.frame() and data.frame() does not work. However then when I try to apply the resulting discriminant function to the total dataset the I get the following message...

Error in UseMethod("predict") : 
  no applicable method for 'predict' applied to an object of class "list"

However, when I check the class of the objects of using class(), it says both objects are in a data.frame format. I checked the dataset to see if there were any incomplete rows or columns that could cause it to treat them as a series of lists instead of a single data.frame, but there are no missing values. Similarly, it does not appear to be due to the names of any variables.

I am not sure why R is treating the object as a list instead of a data.frame (and thereby causing the least discriminant analysis to fail), especially as it recognizes the objects are of the class data.frame.

Upvotes: 1

Views: 820

Answers (2)

StupidWolf
StupidWolf

Reputation: 46898

for lda, you have to provide the formula, so the below works if you provide a dataframe:

lda_train<-lda(class ~ .,data=data_train,prior = c(1,1,1)/3,CV=TRUE)

else if you don't provide the formula, do:

lda(grouping=data_train$class,x=data_train[,-1],prior = c(1,1,1)/3, CV=TRUE)

When you use CV=TRUE, it uses leave-one-out cross validation to give you the posterior, but unfortunately it is not able to retain the model, and you can see it:

class(lda_train)
[1] "list"

To predict, you need to train with CV=FALSE. You provide a data.frame or matrix that has the same column has as that used for the training, and in your case it will be:

lda_train<-lda(class ~ .,data=data_train,prior = c(1,1,1)/3)
data_test=data.frame(Var1=rnorm(10),Var2=rnorm(10),
Var3=rnorm(10),Var4=rnorm(10))
predict(lda_train,data_test)

For lda from MASS, there is no hyper-parameter to be obtained from training, so maybe you want to elaborate on why you need the cross-validation?

In case you would want to explore it, here's how you can run cross-validation for lda (note using lda2):

data_train$class =factor(data$class)
lda_train = train(class ~ .,data=data_train,method="lda2",
trControl = trainControl(method = "cv"))
predict(lda_train,data_test)

Upvotes: 3

Andy Baxter
Andy Baxter

Reputation: 7626

The formula argument is looking for a structured formula declaring how the variables relate. Each variable named must be a vector. You can pass all the names in the same dataframe whilst declaring the data argument:

lda(class ~ Var1 + Var2 + Var3 + Var4, 
    data = data, prior = c(1,1,1)/3, CV=TRUE)

Or pass the columns separately:

lda(data$class ~ scores_train$Var1 +  
      scores_train$Var2 + 
      scores_train$Var3 + 
      scores_train$Var4, 
    prior = c(1,1,1)/3, CV=TRUE)

For the problem of predict not accepting it as an object, you need to change CV to FALSE, otherwise it only returns a list (not a lda object which predict needs):

model <- lda(data$class ~ scores_train$Var1 +  
      scores_train$Var2 + 
      scores_train$Var3 + 
      scores_train$Var4, 
    prior = c(1,1,1)/3, CV=FALSE)

predict(model)

Upvotes: 1

Related Questions