user2011
user2011

Reputation: 11

R function for ROC or sensitivity anlysis for survey data

im working on NHIS 2020 survey and cant find a function for ROC or sensitivity anlysis for logistic regression model is there any function known?? one more Q is there any function for splitting the survey data into training and test ?

im trying to split the survey data not a data frame

data <- svydesign(id=~PPSU, strata=~PSTRAT, 
                     nest = TRUE, 
                     weights=~WTFA_A,
                     data=dat)

and i had made glm model after i had split the data into 0.2\0.8 and after that i transformed it into survey design (not sure if this is right thing to do)

glm10 <- svyglm(PAIFRQ3M_A ~ NOTCOV_A+ SOCERRNDS_A +WELLNESS_A+
                 COGMEMDFF_A+CURJOBSD_A+SMKEV_A +DRK12MN_A+ 
                 CURJOBSD_A+CVDDIAG_A+ANXLEVEL_A*SMKEV_A+
                 AFVET_A*FGEFRQTRD_A+OPDCHRONIC_A+COMDIFF_A+PHSTAT_A*HOSPONGT_A+
                 EMPDYSMSS2_A+PHSTAT_A+SLPHOURS_A,
               design=as.svrepdesign(train.data) ,na.action=na.omit ,family=quasibinomial)

the sensitivity test i ran was :

fitted<-predict(glm10, return.replicates=TRUE, type="response") 

sensitivity<-function(pred,actual) mean(pred>0.1 & actual)/mean(actual) withReplicates(fitted, sensitivity, actual=glm10$PAIFRQ3M_A)

but i get an error Error: unexpected symbol in "sensitivity<-function(pred,actual) mean(pred>0.1 & actual)/mean(actual) withReplicates"

Upvotes: 1

Views: 477

Answers (2)

jpsmith
jpsmith

Reputation: 17450

I have created some sample data and code below that may point you in the correct direction, but your question is quite broad so might not answer everything. The caTools package can help with splitting into test/train. The pROC package can help with ROC.

set.seed(05062020)
# Create sample data
alldata <- data.frame(outcome = sample(0:1, 100, replace = TRUE),
                      predictor1 = sample(1:3, 100, replace = TRUE),
                      predictor2 = sample(1:5, 100, replace = TRUE))

# Split into testing and training
library(caTools)
sample <- sample.split(alldata$outcome, SplitRatio = 0.7)
train <- subset(alldata, sample == TRUE)
test <- subset(alldata, sample == FALSE)

# Run example logistic model
example_model <- glm(outcome ~., family = binomial, data = train)

# get prediction from fitted model
predicts <- predict(example_model, type = "response", newdata = test[,-which(names(test) == "outcome")])

# ROC and plot
library(pROC)
roc(test$outcome, predicts) #ROC

plot.roc(smooth(roc(test$outcome, predicts)), col = 1, lwd = 3, 
         main = "AUC", xlab = "1 - Specificity", legacy.axes = TRUE)

Upvotes: 2

Ane
Ane

Reputation: 365

You can try the pROC package.

To split the data, you need to decide how to split it. For example, you may use half as training data, half as test data. Assume dataset is your dataset. It has 10,000 rows

default_idx = sample(nrow(dataset), 5000)
default_trn = dataset[default_idx, ]
default_tst = dataset[-default_idx, ]

You can then get the ROC like this:

model_glm = glm(DV ~ IV, data = default_trn, family = "binomial")

test_prob = predict(model_glm, newdata = default_tst, type = "response")
test_roc = roc(default_tst$DV ~ test_prob, plot = TRUE, print.auc = TRUE)

See for example here for more detailed explanations: https://daviddalpiaz.github.io/r4sl/logistic-regression.html#roc-curves

Upvotes: 1

Related Questions