Reputation: 1
Please give me a simple example. I am in worry! I have tried the errorest function and do it as the example as it give for 10-fold cv of LDA. But when I used my own data, it just said the predict is not numeric. I don't know why! Thank you! The R code is like this. I want to do the binary LDA so I generate the data:
library(MASS)
n=500
#generate x1 and x2.
Sigma=matrix(c(2,0,0,1),nrow=2,ncol=2)
#Logistic model with parameter{1,4,-2}
beta.star=c(1,4,-2)
Xtilde=mvrnorm(n=n,mu=c(0.5,2),Sigma=Sigma)
X=cbind(1,Xtilde)
z=X%*%beta.star
#pass througn an inv-logit function
pr=exp(z)/(1+exp(z))
#Simulate binary response
# The "probability of respoonse is a vector"
y=rbinom(n,1,pr)
Then I use the LDA to get the model:
library(MASS)
df.cv=data.frame(V1=Xtilde[,1],V2=Xtilde[,2])
exper1<-lda(y~V1+V2,data=df.d)
plda<-predict(exper1,newdata=df.cv)
Finally I want to use the CV with th original data and see the error. I do this which is wrong:
mypredict.lda <- function(object, newdata)
predict(object, newdata = newdata)$class
errorest(y ~ ., data=data.frame(da), model=lda,estimator ="cv", predict= as.numeric(mypredict.lda))
What should I do to get the error with CV?
Upvotes: 0
Views: 3495
Reputation: 1189
So we start with all your previous code setting up fake data
library(MASS)
n=500
#generate x1 and x2.
Sigma=matrix(c(2,0,0,1),nrow=2,ncol=2)
#Logistic model with parameter{1,4,-2}
beta.star=c(1,4,-2)
Xtilde=mvrnorm(n=n,mu=c(0.5,2),Sigma=Sigma)
X=cbind(1,Xtilde)
z=X%*%beta.star
#pass througn an inv-logit function
pr=exp(z)/(1+exp(z))
#Simulate binary response
y=rbinom(n,1,pr)
#Now we do the LDA
df.cv=data.frame(V1=Xtilde[,1],V2=Xtilde[,2])
Below, we divide the data into two parts; a training set and a test set. If you want to do a 10 fold cross validation, you would use 0.9 instead of 0.8 (0.8 corresponds to 80% train, 20% test, which is five-fold cross validation)
library(ROCR)
inds=sample(1:nrow(df.cv),0.8*nrow(df.cv))
df.train=df.cv[inds,]
df.test=df.cv[-inds,]
train.model = lda(y[inds] ~ V1+V2, data=df.train)
From the trained model, we predict on the test set. Below, I determine the predicted values, and then assess the accuracy of the predictions. Here, I use a ROC curve, but you can use whatever metric you'd like, I guess. I didn't understand what you meant by error.
preds=as.numeric(predict(train.model, df.test)$class)
actual=y[-inds]
aucCurve=performance(prediction(preds,actual), "tpr", "fpr")
plot(aucCurve)
The area under this ROC curve is a measure of predictive accuracy. Values closer to 1 mean you have good predictive capability.
auc=performance(prediction(preds,actual), "auc")
[email protected]
Hopefully this helped, and isn't horribly incorrect. Other folks please chime in with corrections or clarifications.
Upvotes: 1