eac2222
eac2222

Reputation: 191

Get predictions from clogitLasso() model?

How would I get predictions from a clogitLasso model?

It will give me a sequence of penalty weights, and the covariate coefficients that go with them, but what I'd like to do next would be to choose one of those weights and predict using the associated model. Then I can evaluate the model using AUC or some such.

Open to suggestions using a different library, as well.

(Open to getting bounced to CrossValidated, as well, but this isn't really a theoretical question. . . .)

Upvotes: 1

Views: 581

Answers (1)

eac2222
eac2222

Reputation: 191

There's no predict() function for clogitLasso(), but I was overthinking this. You can do the matrix multiplication of the data by the coefficients yourself.

For instance:

First we'll simulate some data. 360 observations, in 180 case/control pairs. case is coded 1/0, and set numbers the 180 pairs. There are two covariates: e1 is noise, and x1 is associated with the outcome, case.

library("clogitLasso")
set.seed(0)
N <- 360
mm <- data.frame(case=rep(c(1, 0), times=N/2))
mm$set <- rep(1:(N/2), each=2)
mm$e1 <- rnorm(n=N, mean=5, sd=10)
mm$x1 <- mm$case*10 + rnorm(n=N, mean=0, sd=3)

To get predictions from clogitLasso we need to normalize the covariates (mean = 0, sd = 1) ourselves, before putting the data into the model. (Otherwise clogitLasso translates the coefficients back to the "original scale", which is useless here.)

mm[, c("e1", "x1")] <- scale(mm[, c("e1", "x1")], center=TRUE, scale=TRUE)

Then build the model:

model <- clogitLasso(X=as.matrix(mm[, c("e1", "x1")]), y=as.matrix(mm$case), 
    strata=mm$set, standardize=FALSE)

We need to choose which value of the penalty weight we want to test the predictions for -- here we'll choose the 10th, just because.

And we multiply the original input data by the coefficients ("betas") to attempt to predict the original outcomes -- the value of case:

handMadePredictions <- as.matrix(mm[, c("e1", "x1")]) %*% model$beta[10, ]

This is the linear predictor, which we need to transform back to the probability scale for prediction:

logistic <- function(logOdds) {
  return(exp(logOdds) / (exp(logOdds) + 1))
}

handMadePredictions <- logistic(handMadePredictions)

The original data -- case -- was a series of alternating ones and zeros. We can see that this model predicted those outcomes, from the original inputs, quite well. Either by inspecting round(handMadePredictions) or with a confusion matrix:

table("predicted"=round(handMadePredictions), "Case/control"=mm$case)

         Case/control
predicted   0   1
        0 172  12
        1   8 168

Note that in this toy example there are no stratum effects -- the association between x1 and case is the same, no matter what set the datapoints are in. In this simplified situation there is no need for conditional logistic regression, regular logistic regression will work just fine. But I haven't been able to get plausible prediction results from clogitLasso() when there are stratum effects, which is a whole other question.

Upvotes: 1

Related Questions