donm79
donm79

Reputation: 31

Error running pROC in R: Response and predictor must be vectors of the same length

I've called a multiple logistic regression as follows:

step_1 <- glm(CD3LR ~ alb + surg + ASA_opt + BMI + bil + Hb_cat + MDRD + sex + DM
              + age + Path + Smoking,
              na.action = na.exclude, family = binomial)

When I want to test the model by creating a ROC curve, I use the pROC package and call:

roc(CD3LR, step_1$fitted.values, plot=FALSE)

However this returns the error:

> roc(CD3LR, step_1fitted.values, plot=FALSE) 
Setting levels:control=0,case=1
Error in roc.default(CD3LR,step_1fitted.values, plot = FALSE) : 
  Response and predictor must be vectors of the same length.

I think this is because there are 3 missing values for the MDRD variable and because I've used na.exclude this results in 3 fewer step_1$fitted.values than I have for CD3LR

Is there a way to generate a ROC curve using only the CD3LR values that correspond to the step_1$fitted.values??

Very grateful for any help!

Upvotes: 3

Views: 4946

Answers (2)

StupidWolf
StupidWolf

Reputation: 46968

I notice you have all your variables in the environment, for example like this, and below I introduce 3NA for sex and 2 NA for BMI,

CD3LR = as.numeric(runif(100)>0.5)
alb = rnorm(100)
surg = sample(1:3,100,replace=TRUE)
ASA_opt = rpois(100,50)
BMI = c(NA,NA,rpois(98,100))
bil = rnorm(100)
Hb_cat = sample(1:3,100,replace=TRUE)
MDRD = runif(100)
sex = c(sample(c("M","F"),98,replace=TRUE),NA,NA)
DM = rnorm(100)
age = sample(20:60,100,replace=TRUE)
Path = rnorm(100)
Smoking = sample(c("Yes","NI"),100,replace=TRUE)

So best is to put all of them into a data.frame, do the fit and then the roc curve:

DataFrame = data.frame(
CD3LR,alb,surg,ASA_opt,BMI,bil,Hb_cat,
MDRD,sex,DM,age,Path,Smoking) 

step_1<-glm(CD3LR~alb+surg+ASA_opt+BMI+bil+Hb_cat+MDRD+sex+DM+age+Path+Smoking,
data=DataFrame,na.action=na.exclude,family=binomial)

Thanks to @Calimo for pointing it out, you can also fit with:

step_1<-glm(CD3LR~.,
data=DataFrame,na.action=na.exclude,family=binomial)

This gives you the error:

roc(DataFrame$CD3LR,step_1$fitted.values,plot=FALSE)

We can do:

roc(step_1$y,step_1$fitted.values,plot=FALSE)

Or:

roc(DataFrame[complete.cases(DataFrame),"CD3LR"],step_1$fitted.values,plot=FALSE)

The reason for having a data frame is that you can immediately see the complete set of data which is non NA for all. If you have any NAs in the predictor or response, it will not be used in the regression.

Upvotes: 1

Calimo
Calimo

Reputation: 7969

Your intuition is correct, some values have been filtered out by na.action=na.exclude.

Typically I would recommend using the predict function to obtain new predictions on the data. They look like this:

> predict(step_1)
1           2           3           4           5           6           7   ...
NA          NA  1.04059269  0.60248768  0.81502210  0.23992288  0.08421514  ...

As you can see, the missing values in MDRD produce an NA, rather than being removed.

You can then feed these predictions directly to the roc function:

roc(CD3LR, predict(step_1))

Upvotes: 2

Related Questions