Reputation: 148
I wrote a function to make a new variable based on the level of another variable. I want to use that function repeadely to make a bunch of new variables, and then use them in different regressions. I'm wondering what the most easily understandable and flexible way to do this is, using either a for loop or some kind of apply function.
library(tidyverse)
library(estimatr)
treatment_maker <- function(magnitude_change_level) {
mtcars %>% mutate("treatment_magnitude_{{magnitude_change_level}}":=ifelse(cyl>magnitude_change_level,1,0))
}
#Now I want to loop over the magnitude change levels from 5 to 7, rather than copy and pasting
mtcars <- treatment_maker(5)
mtcars <- treatment_maker(6)
#Then run a seperate regression on each variable
lm_robust(mpg ~ treatment_magnitude_5, data = mtcars) %>% tidy()
lm_robust(mpg ~ treatment_magnitude_6, data = mtcars) %>% tidy()
Upvotes: 2
Views: 204
Reputation: 887048
We can use map
to loop over the magnitude levels, create the binary column with 'treatment_maker', apply the lm_robust
, get the tidy
output and bind the list
elements to a single dataset with _dfr
library(purrr)
library(dplyr)
library(broom)
library(tidyr)
library(stringr)
library(estimatr)
-changed the function so that it can work in the loop
treatment_maker <- function(magnitude_change_level) {
mtcars %>%
mutate(!!paste0("treatment_magnitude_", magnitude_change_level):=
as.integer(cyl > magnitude_change_level))
}
Now, apply the function
map_dfr(as.numeric(1:10), ~ treatment_maker(.x) %>%
summarise(out = list(lm_robust(reformulate(str_c('treatment_magnitude_', .x),
response = 'mpg'), data = cur_data()) %>% tidy)) %>%
unnest(c(out)))
-output
# A tibble: 20 x 9
term estimate std.error statistic p.value conf.low conf.high df outcome
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 (Intercept) 20.1 1.07 18.9 1.53e-18 17.9 22.3 31 mpg
2 treatment_magnitude_1 NA NA NA NA NA NA NA mpg
3 (Intercept) 20.1 1.07 18.9 1.53e-18 17.9 22.3 31 mpg
4 treatment_magnitude_2 NA NA NA NA NA NA NA mpg
5 (Intercept) 20.1 1.07 18.9 1.53e-18 17.9 22.3 31 mpg
6 treatment_magnitude_3 NA NA NA NA NA NA NA mpg
7 (Intercept) 26.7 1.36 19.6 1.17e-18 23.9 29.4 30 mpg
8 treatment_magnitude_4 -10.0 1.52 -6.57 2.84e- 7 -13.1 -6.90 30 mpg
9 (Intercept) 26.7 1.36 19.6 1.17e-18 23.9 29.4 30 mpg
10 treatment_magnitude_5 -10.0 1.52 -6.57 2.84e- 7 -13.1 -6.90 30 mpg
11 (Intercept) 24.0 1.17 20.4 3.68e-19 21.6 26.4 30 mpg
12 treatment_magnitude_6 -8.87 1.36 -6.53 3.17e- 7 -11.6 -6.10 30 mpg
13 (Intercept) 24.0 1.17 20.4 3.68e-19 21.6 26.4 30 mpg
14 treatment_magnitude_7 -8.87 1.36 -6.53 3.17e- 7 -11.6 -6.10 30 mpg
15 (Intercept) 20.1 1.07 18.9 1.53e-18 17.9 22.3 31 mpg
16 treatment_magnitude_8 NA NA NA NA NA NA NA mpg
17 (Intercept) 20.1 1.07 18.9 1.53e-18 17.9 22.3 31 mpg
18 treatment_magnitude_9 NA NA NA NA NA NA NA mpg
19 (Intercept) 20.1 1.07 18.9 1.53e-18 17.9 22.3 31 mpg
20 treatment_magnitude_10 NA NA NA NA NA NA NA mpg
checking with OP's code
mtcars1 <- treatment_maker(1)
> lm_robust(mpg ~ treatment_magnitude_1, data = mtcars1) %>% tidy()
1 coefficient not defined because the design matrix is rank deficient
term estimate std.error statistic p.value conf.low conf.high df outcome
1 (Intercept) 20.09062 1.065424 18.85693 1.526151e-18 17.91768 22.26357 31 mpg
2 treatment_magnitude_1 NA NA NA NA NA NA NA mpg
Or as another example
mtcars5 <- treatment_maker(5)
lm_robust(mpg ~ treatment_magnitude_5, data = mtcars5) %>% tidy()
term estimate std.error statistic p.value conf.low conf.high df outcome
1 (Intercept) 26.66364 1.359764 19.609015 1.171550e-18 23.88663 29.440645 30 mpg
2 treatment_magnitude_5 -10.01602 1.523651 -6.573696 2.839316e-07 -13.12773 -6.904307 30 mpg
Upvotes: 1