Reputation: 173
I am wondering how I can predict outcomes in the 2018:2030 time frame using my data, which includes variables such as YEAR, AGE, FEMALE, and RACE.
I am using a svydesign setup for this prediction, and I get accurate results, below, by just doing simple tabulating.
hiptotal1 <- svyby(~hipPJI, ~YEAR, design = mydesign, FUN = svytotal, vartype = "ci")
YEAR hipPJI ci_l ci_u
2015 2015 10580.001 9861.132 11298.869
2016 2016 11390.000 10601.341 12178.659
2017 2017 11800.003 10961.674 12638.332
Then I try to create a prediction model, and I am definitely doing things wrong, but I am not sure what. My code is:
AdjPoissonHip <- svyglm(hipPJI ~ YEAR, family = poisson(), design = mydesign)
years <- data.frame(YEAR = 2018:2030)
predictHip <- predict(AdjPoissonHip, newdata = years, type = "response", se.fit =TRUE, interval = "predict")
response SE
1 0.0082704 2e-04
2 0.0084020 2e-04
3 0.0085357 2e-04
4 0.0086715 2e-04
5 0.0088095 2e-04
6 0.0089496 3e-04
7 0.0090920 3e-04
8 0.0092367 3e-04
9 0.0093837 3e-04
10 0.0095330 4e-04
11 0.0096847 4e-04
12 0.0098388 4e-04
13 0.0099953 5e-04
I am not sure why my results are so off. Am I just using the wrong option to generate my results?
A second part to this question is if I want to include AGE, FEMALE, and RACE as predictor values but still just want to look at estimates for year, but not stratified by AGE, FEMALE, and RACE, how could I do this?
Upvotes: 0
Views: 589
Reputation: 2765
You don't say what the structure of your data is. One possibility is that you have a record for each person in the dataset.
If so, your call to svytotal
would weight and add up the numbers of hip fractures over all the records for each year, and give an estimate of the population total number of hip fractures in each year. These numbers would be large, as you show.
Your call to svyglm
would be modelling a data set where most records have zero hip fractures and some have one (or maybe more). The fitted probabilities from this model would be at the individual level, giving expected numbers of hip fractures for a typical individual in the population in years 2015, 2016, 2017. These numbers would be small, as you show.
It's possible to get population total predictions from predict.svyglm
, but you need population totals of the covariates. From the help page
predict gives fitted values and sampling variability for specific new values of covariates. When newdata are the population mean it gives the regression estimator of the mean, and when newdata are the population totals and total is specified it gives the regression estimator of the population total. Regression estimators of mean and total can also be obtained with calibrate.
So, you'd need to decide what your population totals were for AGE, FEMALE, and RACE, and call predict.svyglm
with those totals in the newdata
argument, and with the total=
argument giving the population size.
If you think about it, you need population information about age, gender, and race to do the predictions. For example, the number of hip fractures in 2030 will depend on the age distribution of the population in 2030.
Upvotes: 1