Reputation: 75
I am trying to externally validate a logistic model using the published coefficients. As an example, the model I would like help validating is:
glm(outcome ~ sex + age + sex:age, family = "binomial")
Where sex is a factor (e.g., M or F) and age is a continuos value. The coefficients are:
Intercept: -2.827381;
M sex: 0.286466741;
Age: -0.036205346;
sex*age: -0.151205539.
The external validation dataset is a mids (mice) 100 imputations dataset called imputed_data_outcomes. So what I was trying was:
model_linear_predictors <- expression(-2.827381 + (0.286466741*as.numeric(sex))+ (-0.036205346*age) + (-0.151205539*(as.numeric(sex)*age)))
linear_predictors <-with (imputed_data_outcomes, model_linear_predictors)
And then use these linear predictors to calculate the predicted probabilities on the external datasets etc. However this step does not work, and I believe this is down to the wrong assignment of the sex*age coefficient, as it works fine if I remove the interaction term.
Could you please advise on the correct way of dealing with this? Many thanks in advance
Upvotes: 2
Views: 206
Reputation: 610
Emo. with(data, expr,..)
and within(data, expr,..)
are to use the expression on the data to do the calculation, in this case the predictions.
Note that expr
needs to be defined using column names in data
, see below for an example:
> df <- data.frame(x=c(1, 2, 3, 4),
y=c(2, 2, 3, 4))
> with(df, x * y)
[1] 2 4 9 16
> within(df, z <-x * y)
x y z
1 1 2 2
2 2 2 4
3 3 3 9
4 4 4 16
In your case, you will need to use column names from imputed_data_outcomes
, and write like
within(imputed_data_outcomes, prediction <- -2.827381 + (0.286466741*as.numeric(sex))+ (-0.036205346*age) + (-0.151205539*(as.numeric(sex)*age)))
Just make sure those column names are consistent.
Upvotes: 3