Karima21
Karima21

Reputation: 91

How to apply diagnostic prediction model to new data

With some help I performed LASSO regression on boostrapped and multiple imputed datasets to build a diagnostic model that can distinguish disease A from disease B using a large number of predictor variables.

Eventually, I have the following table with the selected variables (which are all categorical variables with yes/no as outcome) and their coefficients:

Predictor mean regression coefficient
Intercept 10.141
var1 1.671
Var2 -1.971
Var3 -5.266
Var4 -2.244
Var5 5.266

My question is: how can I use above table to predict wether a new patient (that has not been used to build te model) has disease A or disease B.

I thought of the following:

Intercept + (1.671 (var1) x 0 or 1) - (1.971 (var2) x 0 or 1) - (5.266 (var3) x 0 or 1) ..... + (5.266 (var5) x 0 or 1) = X

Probability of having disease A (which was coded as 1 in the dataset) = e^X / (1+ e^X)

But is this approach correct?

I hope someone can help me with this!

Upvotes: 0

Views: 118

Answers (1)

Colin H
Colin H

Reputation: 660

Yes, since you are describing logistic regression, the steps are correct. These are the steps to calculating a prediction from your model.

a) Multiply coefficients by x variables, making sure to include the intercept if applicable (with value of 1)

b) Sum the results of a)

c) Exponentiate to produce log odds

d) Calculate final probability with log_odds / (1 + log_odds)

You didn't mention a specific language, but here's some pseudo-code in python using pandas/numpy, assuming a dataset x_variables and a pandas series of coefficients.

scores = x_variables.transpose()
scores = transpose_predictors.mul(coefficients, axis = 0)
sum_scores = scores.sum(axis = 0, skipna = True)
log_odds = np.exp(sum_scores)
final_scores = log_odds / (1 + log_odds)

Edit: Same code in R, where coefficients is a vector of the coefficient values.

# do the scoring via matrix multiplication
scores <- t(t(x_variables) * coefficients)

# sum the scores by row and exponentiate. 
log_odds <- exp(rowSums(scores, na.rm = TRUE))
final_scores <- log_odds / (1 + log_odds)

Upvotes: 0

Related Questions