Perform poisson regression for each value in column

Question

I have a long-form dataframe that I am performing a poisson regression on.

 'data.frame':  20000 obs. of  6 variables:
 $ cal_y  : int  2008 2008 2008 2008 2008 ...
 $ age_y  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ gender : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ cal_m  : int  9 7 8 1 6 11 2 10 3 4 ...
 $ n_outcome: int  276 187 164 352 229 250 332 267 348 291 ...
 $ n_atrisk : int  4645 4645 4645 4645 4645 4645 4645 4645 4645 4645 ...

glm(n_outcome ~ factor(cal_y) + factor(cal_m) + gender + offset(log(n_atrisk)),
data = df, family =poisson)

I would like to know the coefficients of the exposure cal_y for the outcome n_outcome for every age_y and preferably be able to aggregate this information into one df.

I have tried several misguided versions of lapply() and tapply(). Currently, my best solution is to do this by hand:

glm(n_outcome ~ factor(cal_y) + factor(cal_m) + gender + offset(log(n_atrisk)),
data = filter(df, age_y >= 0, age_y <1), family =poisson)

But this is both tedious (range(age_y) = 0 105), the results are not easily combined in a new df and I'm not sure it is statistically correct to subset the data prior to performing the regression.

Any pointers, comments or help is appreciated.

David Robinson · Accepted Answer

You can do this with dplyr and my broom package:

library(dplyr)
library(broom)

results <- df %>%
  group_by(age_y) %>%
  do(tidy(glm(n_outcome ~ factor(cal_y) + factor(cal_m) + gender + offset(log(n_atrisk)),
              data = ., family =poisson)))

This works because group_by and do performs the regression for each age_y value, and then tidy turns each regression into a data frame that can be recombined.

See the broom and dplyr vignette for more.

Perform poisson regression for each value in column

Answers (2)

Related Questions