tomw
tomw

Reputation: 3160

pmap_ variants operating on data.frames as lists

I have a recollection that purrr::pmap_* can treat a data.frame as a list but the syntax eludes me.

Imagine we wanted to fit a separate lm object for each value of mtcars$vs and mtcars$am

library(tidyverse)
library(broom)

d1 <- mtcars %>% 
  group_by(
    vs, am
  ) %>% 
  nest %>% 
  mutate(
    coef = data %>% 
      map(
        ~lm(mpg ~ wt, data =.) %>% 
          tidy
      )
  )

If I wanted to extract the coefficient estimates as an un-nested data.frame, and append the values of am and vs, I might try

d1[, -3] %>% 
  pmap_dfr(
    function(i, j, k)
      k %>% 
      mutate(
        vs = i,
        am = j
      )
  )

But this results in an error. More explicitly declaring these variables as separate lists has the desired effect

list(
  d1$vs,
  d1$am,
  d1$coef
  ) %>% 
  pmap_dfr(
    function(i, j, k)
      k %>% 
      mutate(
        vs = i,
        am = j
      )
  )

Is there a succinct way for pmap_* to treat a data.frame as a list?

Upvotes: 1

Views: 56

Answers (2)

IceCreamToucan
IceCreamToucan

Reputation: 28705

This is because the second list has no names attribute. If you unname d1 it works. The fact that you used the list function in the second example doesn't make a difference (except that it removed the names), because both objects are lists (data frames are lists).

d1[, -3] %>% 
  unname %>% 
  pmap_dfr(
    function(i, j, k)
      k %>% 
      mutate(
        vs = i,
        am = j
      )
  )


# # A tibble: 8 x 7
#   term        estimate std.error statistic   p.value    vs    am
#   <chr>          <dbl>     <dbl>     <dbl>     <dbl> <dbl> <dbl>
# 1 (Intercept)    42.4      3.30      12.8  0.000213      0     1
# 2 wt             -7.91     1.14      -6.93 0.00227       0     1
# 3 (Intercept)    44.1      6.96       6.34 0.00144       1     1
# 4 wt             -7.77     3.36      -2.31 0.0689        1     1
# 5 (Intercept)    31.5      8.98       3.51 0.0171        1     0
# 6 wt             -3.38     2.80      -1.21 0.281         1     0
# 7 (Intercept)    25.1      3.51       7.14 0.0000315     0     0
# 8 wt             -2.44     0.842     -2.90 0.0159        0     0

You can also name the arguments in your first code block's function to match (or use ..1 etc) for the same result

d1[, -3] %>% 
  pmap_dfr(
    function(vs, am, coef)
      coef %>% 
      mutate(
        vs = vs,
        am = am
      )
  )


# # A tibble: 8 x 7
#   term        estimate std.error statistic   p.value    vs    am
#   <chr>          <dbl>     <dbl>     <dbl>     <dbl> <dbl> <dbl>
# 1 (Intercept)    42.4      3.30      12.8  0.000213      0     1
# 2 wt             -7.91     1.14      -6.93 0.00227       0     1
# 3 (Intercept)    44.1      6.96       6.34 0.00144       1     1
# 4 wt             -7.77     3.36      -2.31 0.0689        1     1
# 5 (Intercept)    31.5      8.98       3.51 0.0171        1     0
# 6 wt             -3.38     2.80      -1.21 0.281         1     0
# 7 (Intercept)    25.1      3.51       7.14 0.0000315     0     0
# 8 wt             -2.44     0.842     -2.90 0.0159        0     0

You could also use wap from the experimental rap package

library(rap)

d1[, -3] %>% 
  wap( ~ coef %>% 
          mutate(
            vs = vs,
            am = am)) %>% 
  bind_rows
# # A tibble: 8 x 7
#   term        estimate std.error statistic   p.value    vs    am
#   <chr>          <dbl>     <dbl>     <dbl>     <dbl> <dbl> <dbl>
# 1 (Intercept)    42.4      3.30      12.8  0.000213      0     1
# 2 wt             -7.91     1.14      -6.93 0.00227       0     1
# 3 (Intercept)    44.1      6.96       6.34 0.00144       1     1
# 4 wt             -7.77     3.36      -2.31 0.0689        1     1
# 5 (Intercept)    31.5      8.98       3.51 0.0171        1     0
# 6 wt             -3.38     2.80      -1.21 0.281         1     0
# 7 (Intercept)    25.1      3.51       7.14 0.0000315     0     0
# 8 wt             -2.44     0.842     -2.90 0.0159        0     0

Upvotes: 2

akrun
akrun

Reputation: 887571

We can use the standard option to extract the components (..1, ..2, etc)

d1[, -3]  %>% 
    pmap_dfr(~ ..3 %>%
                  mutate(vs = ..1, am = ..2))
# A tibble: 8 x 7
#  term        estimate std.error statistic   p.value    vs    am
#  <chr>          <dbl>     <dbl>     <dbl>     <dbl> <dbl> <dbl>
#1 (Intercept)    42.4      3.30      12.8  0.000213      0     1
#2 wt             -7.91     1.14      -6.93 0.00227       0     1
#3 (Intercept)    44.1      6.96       6.34 0.00144       1     1
#4 wt             -7.77     3.36      -2.31 0.0689        1     1
#5 (Intercept)    31.5      8.98       3.51 0.0171        1     0
#6 wt             -3.38     2.80      -1.21 0.281         1     0
#7 (Intercept)    25.1      3.51       7.14 0.0000315     0     0
#8 wt             -2.44     0.842     -2.90 0.0159        0     0

Upvotes: 3

Related Questions