Emos
Emos

Reputation: 75

How to externally validate a logistic model by using external coefficients in R

I am trying to externally validate a logistic model using the published coefficients. As an example, the model I would like help validating is:

glm(outcome ~ sex + age + sex:age, family = "binomial")

Where sex is a factor (e.g., M or F) and age is a continuos value. The coefficients are:

Intercept: -2.827381;
M sex: 0.286466741;
Age: -0.036205346;
sex*age: -0.151205539.

The external validation dataset is a mids (mice) 100 imputations dataset called imputed_data_outcomes. So what I was trying was:

model_linear_predictors <- expression(-2.827381 + (0.286466741*as.numeric(sex))+ (-0.036205346*age) + (-0.151205539*(as.numeric(sex)*age)))
linear_predictors <-with (imputed_data_outcomes, model_linear_predictors)

And then use these linear predictors to calculate the predicted probabilities on the external datasets etc. However this step does not work, and I believe this is down to the wrong assignment of the sex*age coefficient, as it works fine if I remove the interaction term.

Could you please advise on the correct way of dealing with this? Many thanks in advance

Upvotes: 2

Views: 206

Answers (1)

Jinjin
Jinjin

Reputation: 610

Emo. with(data, expr,..) and within(data, expr,..)are to use the expression on the data to do the calculation, in this case the predictions.

Note that expr needs to be defined using column names in data, see below for an example:

> df <- data.frame(x=c(1, 2, 3, 4),
                  y=c(2, 2, 3, 4))
 
> with(df, x * y)
[1]  2  4  9 16
> within(df, z <-x * y)
  x y  z
1 1 2  2
2 2 2  4
3 3 3  9
4 4 4 16

In your case, you will need to use column names from imputed_data_outcomes, and write like

within(imputed_data_outcomes, prediction <- -2.827381 + (0.286466741*as.numeric(sex))+ (-0.036205346*age) + (-0.151205539*(as.numeric(sex)*age)))

Just make sure those column names are consistent.

Upvotes: 3

Related Questions