Reputation: 191
How would I get predictions from a clogitLasso model
?
It will give me a sequence of penalty weights, and the covariate coefficients that go with them, but what I'd like to do next would be to choose one of those weights and predict using the associated model. Then I can evaluate the model using AUC
or some such.
Open to suggestions using a different library, as well.
(Open to getting bounced to CrossValidated, as well, but this isn't really a theoretical question. . . .)
Upvotes: 1
Views: 581
Reputation: 191
There's no predict()
function for clogitLasso()
, but I was overthinking this. You can do the matrix multiplication of the data by the coefficients yourself.
For instance:
First we'll simulate some data. 360 observations, in 180 case/control pairs. case
is coded 1/0, and set
numbers the 180 pairs. There are two covariates: e1
is noise, and x1
is associated with the outcome, case
.
library("clogitLasso")
set.seed(0)
N <- 360
mm <- data.frame(case=rep(c(1, 0), times=N/2))
mm$set <- rep(1:(N/2), each=2)
mm$e1 <- rnorm(n=N, mean=5, sd=10)
mm$x1 <- mm$case*10 + rnorm(n=N, mean=0, sd=3)
To get predictions from clogitLasso we need to normalize the covariates (mean = 0, sd = 1) ourselves, before putting the data into the model. (Otherwise clogitLasso translates the coefficients back to the "original scale", which is useless here.)
mm[, c("e1", "x1")] <- scale(mm[, c("e1", "x1")], center=TRUE, scale=TRUE)
Then build the model:
model <- clogitLasso(X=as.matrix(mm[, c("e1", "x1")]), y=as.matrix(mm$case),
strata=mm$set, standardize=FALSE)
We need to choose which value of the penalty weight we want to test the predictions for -- here we'll choose the 10th, just because.
And we multiply the original input data by the coefficients ("betas") to attempt to predict the original outcomes -- the value of case
:
handMadePredictions <- as.matrix(mm[, c("e1", "x1")]) %*% model$beta[10, ]
This is the linear predictor, which we need to transform back to the probability scale for prediction:
logistic <- function(logOdds) {
return(exp(logOdds) / (exp(logOdds) + 1))
}
handMadePredictions <- logistic(handMadePredictions)
The original data -- case
-- was a series of alternating ones and zeros. We can see that this model predicted those outcomes, from the original inputs, quite well. Either by inspecting round(handMadePredictions)
or with a confusion matrix:
table("predicted"=round(handMadePredictions), "Case/control"=mm$case)
Case/control
predicted 0 1
0 172 12
1 8 168
Note that in this toy example there are no stratum effects -- the association between x1
and case
is the same, no matter what set
the datapoints are in. In this simplified situation there is no need for conditional logistic regression, regular logistic regression will work just fine. But I haven't been able to get plausible prediction results from clogitLasso()
when there are stratum effects, which is a whole other question.
Upvotes: 1