How can I create an age-stratified linear regression?

Question

I want to do some predictive modeling with a linear regression, where the Year is the independent variable and each age bracket (15-17 years, 18-19 years, 20–24 years, etc) is the dependent variable. I would like to predict the birth rates for each age group based on historical data (predict birth rates for each age group into 10 years into the future). The data within each age bracket variable represents birth rates. The Crude Birth Rate variable can be ignored for the purposes of this analysis, as I want to perform the analysis within each age bracket (rather than aggregated).

My data are below:

> dput(age)
structure(list(Year = c(1950, 1960, 1970, 1980, 1981, 1982, 1983, 
1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 
1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 
2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 
2017, 2018, NA), `Crude Birth Rate` = c(24.1, 23.7, 18.4, 15.9, 
15.8, 15.9, 15.6, 15.6, 15.8, 15.6, 15.7, 16, 16.4, 16.7, 16.2, 
15.8, 15.4, 15, 14.6, 14.4, 14.2, 14.3, 14.2, 14.4, 14.1, 14, 
14.1, 14, 14, 14.3, 14.3, 14, 13.5, 13, 12.7, 12.6, 12.4, 12.5, 
12.4, 12.2, 11.8, 11.6, NA), `15-17 years` = c(40.7, 43.9, 38.8, 
32.5, 32, 32.3, 31.8, 31, 31, 30.5, 31.7, 33.6, 36.4, 37.5, 38.6, 
37.6, 37.5, 37.2, 35.5, 33.3, 31.4, 29.9, 28.2, 26.9, 24.5, 23.1, 
22.2, 21.8, 21.1, 21.6, 21.7, 21.1, 19.6, 17.3, 15.4, 14.1, 12.3, 
10.9, 9.9, 8.8, 7.9, 7.2, NA), `18-19 years` = c(132.7, 166.7, 
114.7, 82.1, 80, 79.4, 77.4, 77.4, 79.6, 79.6, 78.5, 79.9, 84.2, 
88.6, 94, 93.6, 91.1, 90.2, 87.7, 84.7, 82.1, 80.9, 79, 78.1, 
75.5, 72.2, 69.6, 68.7, 68.4, 71.2, 71.7, 68.2, 64, 58.2, 54.1, 
51.4, 47.1, 43.8, 40.7, 37.5, 35.1, 32.3, NA), `20–24 years` = c(196.6, 
258.1, 167.8, 115.1, 112.2, 111.6, 107.8, 106.8, 108.3, 107.4, 
107.9, 110.2, 113.8, 116.5, 115.3, 113.7, 111.3, 109.2, 107.5, 
107.8, 107.3, 108.4, 107.9, 109.7, 105.6, 103.1, 102.3, 101.5, 
101.8, 105.5, 105.4, 101.8, 96.2, 90, 85.3, 83.1, 80.7, 79, 76.8, 
73.8, 71, 68, NA), `25–29 years` = c(166.1, 197.4, 145.1, 112.9, 
111.5, 111, 108.5, 108.7, 111, 109.8, 111.6, 114.4, 117.6, 120.2, 
117.2, 115.7, 113.2, 111, 108.8, 108.6, 108.3, 110.2, 111.2, 
113.5, 113.8, 114.7, 116.7, 116.5, 116.5, 118, 118.1, 115, 111.5, 
108.3, 107.2, 106.5, 105.5, 105.8, 104.3, 102.1, 98, 95.3, NA
), `30–34 years` = c(103.7, 112.7, 73.3, 61.9, 61.4, 64.1, 
64.9, 67, 69.1, 70.1, 72.1, 74.8, 77.4, 80.8, 79.2, 79.6, 79.9, 
80.4, 81.1, 82.1, 83, 85.2, 87.1, 91.2, 91.8, 92.6, 95.7, 96.2, 
96.7, 98.9, 100.6, 99.4, 97.5, 96.5, 96.5, 97.3, 98, 100.8, 101.5, 
102.7, 100.3, 99.7, NA), `35–39 years` = c(52.9, 56.2, 31.7, 
19.8, 20, 21.2, 22, 22.9, 24, 24.4, 26.3, 28.1, 29.9, 31.7, 31.9, 
32.3, 32.7, 33.4, 34, 34.9, 35.7, 36.9, 37.8, 39.7, 40.5, 41.6, 
43.9, 45.5, 46.4, 47.5, 47.6, 46.8, 46.1, 45.9, 47.2, 48.3, 49.3, 
51, 51.8, 52.7, 52.3, 52.6, NA), `40–44 years` = c(15.1, 15.5, 
8.1, 3.9, 3.8, 3.9, 3.9, 3.9, 4, 4.1, 4.4, 4.8, 5.2, 5.5, 5.5, 
5.9, 6.1, 6.4, 6.6, 6.8, 7.1, 7.4, 7.4, 8, 8.1, 8.3, 8.7, 9, 
9.1, 9.4, 9.6, 9.9, 10, 10.2, 10.3, 10.4, 10.4, 10.6, 11, 11.4, 
11.6, 11.8, NA), `45-54 years` = c(1.2, 0.9, 0.5, 0.2, 0.2, 0.2, 
0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.3, 0.3, 0.3, 0.3, 
0.3, 0.4, 0.4, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.7, 
0.7, 0.7, 0.7, 0.7, 0.8, 0.8, 0.8, 0.9, 0.9, 0.9, NA)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -43L), spec = structure(list(
    cols = list(Year = structure(list(), class = c("collector_double", 
    "collector")), `Crude Birth Rate` = structure(list(), class = c("collector_double", 
    "collector")), `15-17 years` = structure(list(), class = c("collector_double", 
    "collector")), `18-19 years` = structure(list(), class = c("collector_double", 
    "collector")), `20–24 years` = structure(list(), class = c("collector_double", 
    "collector")), `25–29 years` = structure(list(), class = c("collector_double", 
    "collector")), `30–34 years` = structure(list(), class = c("collector_double", 
    "collector")), `35–39 years` = structure(list(), class = c("collector_double", 
    "collector")), `40–44 years` = structure(list(), class = c("collector_double", 
    "collector")), `45-54 years` = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))

Maurits Evers · Accepted Answer

Bypassing a discussion on the suitability of a linear regression model for characterising birth rates, I'd probably start with a generalised linear mixed-effect model (treating the age band as the random effect).

First, let's reshape the data (I assume the original data is in data): Remove column "Crude Birth Rate", reshape from wide to long, and transform "Year" to time (in years) since min(data$Year).

df <- data %>%
    filter(!is.na(Year)) %>%
    select(-`Crude Birth Rate`) %>%
    pivot_longer(-Year) %>%
    mutate(t = Year - min(Year))

We can then fit the model (I'm using the Bayesian/rstan "version" of glmer). Here I am assuming normally distributed errors (which is not going to be correct, seeing that you are modelling birth rates, but will do for demonstration purposes).

library(rstanarm)
fit <- stan_glmer(
    value ~ 1 + t + (1 + t | name), 
    data = df)

Skipping model validation (which you shouldn't;-) we can visualise draws from the posterior predictive

library(tidybayes)
df %>%
    modelr::data_grid(
        name = unique(name),
        t = seq(min(t), max(t), by = 1)) %>%
    add_predicted_draws(fit) %>%
    ggplot(aes(t, value)) + 
    geom_point(data = df) + 
    geom_line(data = df) + 
    stat_lineribbon(aes(y = .prediction), fill = "orange", alpha = 0.2) +
    facet_wrap(~ name, ncol = 4) +
    scale_x_continuous(labels = ~ .x + min(df$Year))

How can I create an age-stratified linear regression?

Answers (2)

Related Questions