Reputation: 91
With some help I performed LASSO regression on boostrapped and multiple imputed datasets to build a diagnostic model that can distinguish disease A from disease B using a large number of predictor variables.
Eventually, I have the following table with the selected variables (which are all categorical variables with yes/no as outcome) and their coefficients:
Predictor | mean regression coefficient |
---|---|
Intercept | 10.141 |
var1 | 1.671 |
Var2 | -1.971 |
Var3 | -5.266 |
Var4 | -2.244 |
Var5 | 5.266 |
My question is: how can I use above table to predict wether a new patient (that has not been used to build te model) has disease A or disease B.
I thought of the following:
Intercept + (1.671 (var1) x 0 or 1) - (1.971 (var2) x 0 or 1) - (5.266 (var3) x 0 or 1) ..... + (5.266 (var5) x 0 or 1) = X
Probability of having disease A (which was coded as 1 in the dataset) = e^X / (1+ e^X)
But is this approach correct?
I hope someone can help me with this!
Upvotes: 0
Views: 118
Reputation: 660
Yes, since you are describing logistic regression, the steps are correct. These are the steps to calculating a prediction from your model.
a) Multiply coefficients by x variables, making sure to include the intercept if applicable (with value of 1)
b) Sum the results of a)
c) Exponentiate to produce log odds
d) Calculate final probability with log_odds / (1 + log_odds)
You didn't mention a specific language, but here's some pseudo-code in python
using pandas/numpy
, assuming a dataset x_variables
and a pandas series
of coefficients
.
scores = x_variables.transpose()
scores = transpose_predictors.mul(coefficients, axis = 0)
sum_scores = scores.sum(axis = 0, skipna = True)
log_odds = np.exp(sum_scores)
final_scores = log_odds / (1 + log_odds)
Edit: Same code in R
, where coefficients
is a vector of the coefficient values.
# do the scoring via matrix multiplication
scores <- t(t(x_variables) * coefficients)
# sum the scores by row and exponentiate.
log_odds <- exp(rowSums(scores, na.rm = TRUE))
final_scores <- log_odds / (1 + log_odds)
Upvotes: 0