Reputation: 2095
With the data like this:
df <- tibble(x = runif(200), y = runif(200, 0, 3), is_active = sample(c(0, 1), size = 200, replace = TRUE, prob = c(0.2, 0.8)),
var1 = sample(c(0, 1), 200, TRUE), var2 = sample(c(0, 1), 200, TRUE))
# A tibble: 6 x 5
x y is_active var1 var2
<dbl> <dbl> <dbl> <dbl> <dbl>
1 0.0812 2.42 0 0 0
2 0.313 1.61 0 1 1
3 0.241 2.90 1 0 0
4 0.906 1.08 1 0 1
5 0.652 2.86 0 0 0
6 0.231 0.730 1 1 0
...
I want to calculate the proportion of is_active
column only for those observations where var1==1
, then for those where var2==1
etc. I have written a function that is applicable to one variable:
f <- function(df, var){
var <- ensym(var)
df %>%
filter(!!var == 1) %>%
mutate(xcut = cut(x, breaks = 10),
ycut = cut(y, breaks = 20)) %>%
group_by(xcut, ycut) %>%
summarise(!!paste(var, 'proportion', sep = '_') := mean(is_active)) %>%
ungroup()
}
And calling it as below works fine:
f(df, var1)
f(df, var2)
The issue is that I have a hundreds of columns like var1
and var2
and I'd like to iterate over all of them, calculating a defined proportion of is_active
for each of them. map_at(df, vars(var1, var2), f)
doesn't work here as it is applied to subsequent columns (vectors) and doesn't take a whole data frame as input for each call. How can I achieve it, preferably with purrr
package?
Upvotes: 2
Views: 113
Reputation: 69
I would do something like this
calc_pct_isactive <- function(df, regex_col = "^var") {
require(tidyverse)
df %>%
pivot_longer(cols = matches(regex_col)) %>%
group_by(is_active, name, value) %>%
tally(name = "count") %>%
group_by(is_active, name) %>%
mutate(base = sum(count,na.rm = TRUE),
pct = count/base) %>%
filter(is_active ==1, value ==1)
}
calc_pct_isactive(df)
Upvotes: 0
Reputation: 389235
You could pass the input to your function as string and modify the function a little as :
library(tidyverse)
f <- function(df, var){
df %>%
filter(!!sym(var) == 1) %>%
mutate(xcut = cut(x, breaks = 10),
ycut = cut(y, breaks = 20)) %>%
group_by(xcut, ycut) %>%
summarise(!!paste(var, 'proportion', sep = '_') := mean(is_active)) %>%
ungroup()
}
you can then do
map(c('var1', 'var2'), f, df = df)
#[[1]]
# A tibble: 2 x 3
# xcut ycut var1_proportion
# <fct> <fct> <dbl>
#1 (0.231,0.239] (0.729,0.774] 1
#2 (0.305,0.313] (1.57,1.61] 0
#[[2]]
# A tibble: 2 x 3
# xcut ycut var2_proportion
# <fct> <fct> <dbl>
#1 (0.312,0.372] (1.58,1.61] 0
#2 (0.847,0.907] (1.08,1.11] 1
Upvotes: 2