Ian
Ian

Reputation: 1165

Re-using result of function call in dplyr::mutate

I use dplyr::do with glm to fit a model by group to example data. I want to add columns with the upper and lower limits of the confidence interval: how can I avoid making the call to confint twice? In general, is there a way to assign generic output to a new column using dplyr::mutate?

df <- data.frame(
  x = rep(c("a", "b"), each=10),
  y = c(rpois(10, 0.5), rpois(10, 2.2)))

sdf <- df %>%
  group_by(x) %>%
  do(fit=glm(y ~ 1, poisson, data=.))

mutate(sdf,
  est=coef(fit),
  cil=confint(fit)[1],
  ciu=confint(fit)[2])

In short, I want this to work:

mutate(sdf, ci=confint(fit)) %>%
  mutate(cil=ci[1], ciu=ci[2])

If I use do again, I lose the fitted model and x.

Solution

What I actually used (learned from the accepted answer):

sdf <- df %>%
  group_by(x) %>%
  do({
    fit <- glm(y ~ 1, poisson, data=.)
    ci <- confint(fit)
    data.frame(
      est=coef(fit),
      cil=ci[1],
      ciu=ci[2])
  })

Upvotes: 0

Views: 212

Answers (2)

Nate
Nate

Reputation: 10671

all of the above comments are nice newer packages to help with your problem (I highly recommend purrr), but if you want to stick with do you can reformat it like this so you only are calling confint once per group:

sdf <- df %>%
    group_by(x) %>%
    do({fit <- glm(y ~ 1, poisson, data=.);
        data.frame(confint(fit), coef(fit))})

the output requires some work to get into a plotable format:

sdf %>% mutate(ci = rep(c("low", "high"), legnth.out = nrow(.))) %>% spread(ci, confint.fit.)

Upvotes: 1

Jake Kaupp
Jake Kaupp

Reputation: 8072

As asked in the comments, here's an approach using dplyr, purrr,tidyr and broom.

library(purrr)
library(tidyr)
library(dplyr)
library(broom)

sdf <- df %>%
  nest(y) %>%
  mutate(model = map(data, ~glm(y ~ 1, poisson, data = .))) %>% 
  unnest(map(model, tidy))

Source: local data frame [2 x 8]

       x            data        model        term   estimate std.error statistic      p.value
  (fctr)           (chr)        (chr)       (chr)      (dbl)     (dbl)     (dbl)        (dbl)
1      a <tbl_df [10,1]> <S3:glm, lm> (Intercept) -0.5108256 0.4082458 -1.251270 2.108361e-01
2      b <tbl_df [10,1]> <S3:glm, lm> (Intercept)  1.0296194 0.1889795  5.448311 5.085025e-08

I would read more about purrr, tidyr and broom via google, and the package vignettes. There is also a lot of good information on the RStudio Blog about tidyverse packages.

Upvotes: 2

Related Questions