How to apply diagnostic prediction model to new data

Question

With some help I performed LASSO regression on boostrapped and multiple imputed datasets to build a diagnostic model that can distinguish disease A from disease B using a large number of predictor variables.

Eventually, I have the following table with the selected variables (which are all categorical variables with yes/no as outcome) and their coefficients:

Predictor	mean regression coefficient
Intercept	10.141
var1	1.671
Var2	-1.971
Var3	-5.266
Var4	-2.244
Var5	5.266

My question is: how can I use above table to predict wether a new patient (that has not been used to build te model) has disease A or disease B.

I thought of the following:

Intercept + (1.671 (var1) x 0 or 1) - (1.971 (var2) x 0 or 1) - (5.266 (var3) x 0 or 1) ..... + (5.266 (var5) x 0 or 1) = X

Probability of having disease A (which was coded as 1 in the dataset) = e^X / (1+ e^X)

But is this approach correct?

I hope someone can help me with this!

Colin H · Accepted Answer

Yes, since you are describing logistic regression, the steps are correct. These are the steps to calculating a prediction from your model.

a) Multiply coefficients by x variables, making sure to include the intercept if applicable (with value of 1)

b) Sum the results of a)

c) Exponentiate to produce log odds

d) Calculate final probability with log_odds / (1 + log_odds)

You didn't mention a specific language, but here's some pseudo-code in python using pandas/numpy, assuming a dataset x_variables and a pandas series of coefficients.

scores = x_variables.transpose()
scores = transpose_predictors.mul(coefficients, axis = 0)
sum_scores = scores.sum(axis = 0, skipna = True)
log_odds = np.exp(sum_scores)
final_scores = log_odds / (1 + log_odds)

Edit: Same code in R, where coefficients is a vector of the coefficient values.

# do the scoring via matrix multiplication
scores <- t(t(x_variables) * coefficients)

# sum the scores by row and exponentiate. 
log_odds <- exp(rowSums(scores, na.rm = TRUE))
final_scores <- log_odds / (1 + log_odds)

How to apply diagnostic prediction model to new data

Answers (1)

Related Questions