jorn_data
jorn_data

Reputation: 85

Area under ROC curve of large ff data frame

Hi I am using the R libraries

library(pROC)
library(ff)
library(ffbase)
library(biglm)

and the following code to generate a logistic regression model using an ffdf large data frame and compute the area under the ROC curve:

mymodel <- bigglm(outcome~x1+x2+x3,family=binomial("logit"),data=myffdf,maxit=20)
summary(mymodel)
pred <- predict(mymodel,myffdf,type="response")
rocobj <- roc(myffdf$outcome, pred)

I get the following error:

Error in opsff_compare_logic(x, y, "|") : 
operator requires length 1 for e2, recycling not possible

Thank you for any advice how to get the AUC.

Upvotes: 2

Views: 2295

Answers (2)

jorn_data
jorn_data

Reputation: 85

This will work:

roc(myffdf$outcome[], pred)

Note the square brackets.

Thanks to user20650 and JVL

Upvotes: 2

JVL
JVL

Reputation: 656

The function pROC::roc tries to check for NAs in myffdf$outcome or pred using the following line:

nas <- is.na(response) | is.na(predictor)

But a glance at ffbase::opsff_compare_logic reveals that an ff_vector object can only be compared to another ff_vector object or to a vector of length 1. So the error occurs because myffdf$outcome is an ff_vector, but pred is not and has length > 1.

A possible solution might be to call

rocobj <- roc(myffdf$outcome, as.ff(pred))

instead.

Upvotes: 1

Related Questions