Jack Landry
Jack Landry

Reputation: 148

Loop or use apply function to make new variables using custom function

I wrote a function to make a new variable based on the level of another variable. I want to use that function repeadely to make a bunch of new variables, and then use them in different regressions. I'm wondering what the most easily understandable and flexible way to do this is, using either a for loop or some kind of apply function.

library(tidyverse)
library(estimatr)
treatment_maker <- function(magnitude_change_level) {
  mtcars %>% mutate("treatment_magnitude_{{magnitude_change_level}}":=ifelse(cyl>magnitude_change_level,1,0))
}
#Now I want to loop over the magnitude change levels from 5 to 7, rather than copy and pasting
mtcars <- treatment_maker(5)
mtcars <- treatment_maker(6)
#Then run a seperate regression on each variable
lm_robust(mpg ~ treatment_magnitude_5, data = mtcars) %>% tidy()
lm_robust(mpg ~ treatment_magnitude_6, data = mtcars) %>% tidy()

Upvotes: 2

Views: 204

Answers (1)

akrun
akrun

Reputation: 887048

We can use map to loop over the magnitude levels, create the binary column with 'treatment_maker', apply the lm_robust, get the tidy output and bind the list elements to a single dataset with _dfr

library(purrr)
library(dplyr)
library(broom)
library(tidyr)
library(stringr)
library(estimatr)

-changed the function so that it can work in the loop

treatment_maker <- function(magnitude_change_level) {
    
   mtcars %>% 
      mutate(!!paste0("treatment_magnitude_", magnitude_change_level):=
         as.integer(cyl > magnitude_change_level))
 }

Now, apply the function

map_dfr(as.numeric(1:10), ~ treatment_maker(.x) %>%
     summarise(out = list(lm_robust(reformulate(str_c('treatment_magnitude_', .x),
         response = 'mpg'), data = cur_data()) %>% tidy)) %>%
       unnest(c(out)))

-output

# A tibble: 20 x 9
   term                   estimate std.error statistic   p.value conf.low conf.high    df outcome
   <chr>                     <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <dbl> <chr>  
 1 (Intercept)               20.1       1.07     18.9   1.53e-18     17.9     22.3     31 mpg    
 2 treatment_magnitude_1     NA        NA        NA    NA            NA       NA       NA mpg    
 3 (Intercept)               20.1       1.07     18.9   1.53e-18     17.9     22.3     31 mpg    
 4 treatment_magnitude_2     NA        NA        NA    NA            NA       NA       NA mpg    
 5 (Intercept)               20.1       1.07     18.9   1.53e-18     17.9     22.3     31 mpg    
 6 treatment_magnitude_3     NA        NA        NA    NA            NA       NA       NA mpg    
 7 (Intercept)               26.7       1.36     19.6   1.17e-18     23.9     29.4     30 mpg    
 8 treatment_magnitude_4    -10.0       1.52     -6.57  2.84e- 7    -13.1     -6.90    30 mpg    
 9 (Intercept)               26.7       1.36     19.6   1.17e-18     23.9     29.4     30 mpg    
10 treatment_magnitude_5    -10.0       1.52     -6.57  2.84e- 7    -13.1     -6.90    30 mpg    
11 (Intercept)               24.0       1.17     20.4   3.68e-19     21.6     26.4     30 mpg    
12 treatment_magnitude_6     -8.87      1.36     -6.53  3.17e- 7    -11.6     -6.10    30 mpg    
13 (Intercept)               24.0       1.17     20.4   3.68e-19     21.6     26.4     30 mpg    
14 treatment_magnitude_7     -8.87      1.36     -6.53  3.17e- 7    -11.6     -6.10    30 mpg    
15 (Intercept)               20.1       1.07     18.9   1.53e-18     17.9     22.3     31 mpg    
16 treatment_magnitude_8     NA        NA        NA    NA            NA       NA       NA mpg    
17 (Intercept)               20.1       1.07     18.9   1.53e-18     17.9     22.3     31 mpg    
18 treatment_magnitude_9     NA        NA        NA    NA            NA       NA       NA mpg    
19 (Intercept)               20.1       1.07     18.9   1.53e-18     17.9     22.3     31 mpg    
20 treatment_magnitude_10    NA        NA        NA    NA            NA       NA       NA mpg    

checking with OP's code

mtcars1 <- treatment_maker(1)
> lm_robust(mpg ~ treatment_magnitude_1, data = mtcars1) %>% tidy()
1 coefficient  not defined because the design matrix is rank deficient

                   term estimate std.error statistic      p.value conf.low conf.high df outcome
1           (Intercept) 20.09062  1.065424  18.85693 1.526151e-18 17.91768  22.26357 31     mpg
2 treatment_magnitude_1       NA        NA        NA           NA       NA        NA NA     mpg

Or as another example

mtcars5 <- treatment_maker(5)
  lm_robust(mpg ~ treatment_magnitude_5, data = mtcars5) %>% tidy()
                   term  estimate std.error statistic      p.value  conf.low conf.high df outcome
1           (Intercept)  26.66364  1.359764 19.609015 1.171550e-18  23.88663 29.440645 30     mpg
2 treatment_magnitude_5 -10.01602  1.523651 -6.573696 2.839316e-07 -13.12773 -6.904307 30     mpg

Upvotes: 1

Related Questions