andrew_reece
andrew_reece

Reputation: 21274

Lost column name when applying lm with summarise/across

I want to use summarise/across with lm to fit regressions using different columns in a tibble. Like this:

library(tidyverse)
library(broom)

fits <- tibble(mtcars) %>% 
  summarise(across(c(vs, am), ~list(tidy(lm(wt ~ .x + mpg))))) 

But the columns that get passed into lm as '.x', end up labeled as .x in the regression output.

fits %>% unnest(vs)

# A tibble: 3 x 6
  term        estimate std.error statistic  p.value am              
  <chr>          <dbl>     <dbl>     <dbl>    <dbl> <list>          
1 (Intercept)   6.10      0.353     17.3   8.36e-17 <tibble [3 × 5]>
2 .x            0.0738    0.239      0.308 7.60e- 1 <tibble [3 × 5]>
3 mpg          -0.145     0.0200    -7.24  5.63e- 8 <tibble [3 × 5]>

I can preserve the name if I build the lm formula on the fly, and use cur_column(), but this feels kludgy:

tibble(mtcars) %>% 
  summarise(across(c(vs, am), 
           ~list(tidy(lm(formula(paste0("wt ~ ", cur_column(), " + mpg"))))))) %>% 
  unnest(vs)

# A tibble: 3 x 6
  term        estimate std.error statistic  p.value am              
  <chr>          <dbl>     <dbl>     <dbl>    <dbl> <list>          
1 (Intercept)   6.10      0.353     17.3   8.36e-17 <tibble [3 × 5]>
2 vs            0.0738    0.239      0.308 7.60e- 1 <tibble [3 × 5]>
3 mpg          -0.145     0.0200    -7.24  5.63e- 8 <tibble [3 × 5]>

I want the output to correctly use the true column name of .x, without having to do this workaround, but still using the summarise/across motif, without incorporating map.

Seems like this should be possible. Any suggestions?

*copying my comment from @akrun's answer to clarify what i'm looking for:

What I really want to know is, is the column name preserved in the summarise/across operation in a way that I can reference it directly in lm. Something like {{.x}} or rlang::as_name(.x). I mean, I know those don't work, but it seems like name information should be preserved, aside from just the string version in cur_column.

Upvotes: 0

Views: 47

Answers (1)

akrun
akrun

Reputation: 887651

Can make it shorter with reformulate

library(dplyr)
library(broom)
library(tidyr)
tibble(mtcars) %>%
   summarise(across(c(vs, am), ~       
     list(tidy(lm(reformulate(c(cur_column(), "mpg"), "wt")))))) %>% 
   unnest(vs)

-output

# A tibble: 3 x 6
#  term        estimate std.error statistic  p.value am              
#  <chr>          <dbl>     <dbl>     <dbl>    <dbl> <list>          
#1 (Intercept)   6.10      0.353     17.3   8.36e-17 <tibble [3 × 5]>
#2 vs            0.0738    0.239      0.308 7.60e- 1 <tibble [3 × 5]>
#3 mpg          -0.145     0.0200    -7.24  5.63e- 8 <tibble [3 × 5]>

Upvotes: 1

Related Questions