Use vector of columns in custom dplyr function

Question

I'm trying to use the tidyverse to make a function that allows me to take a vector of column names (of class factor), count how many observations meet a certain condition (value == "yes"), mutate() and make new columns using this sum so I can summarise the data later.

I have written a function that can do this for a single column, but I want to be able to pass a vector of column names of any length using all_of() grammar to the function. When I try this, I get a new column with the same name as the vector (including my _count suffix) not the values inside the vector.

This is similar to TimTeaFan's Post but I would like to pass a vector where length > 1.

I think I might need to use the (. . .) option in dplyr and some more rlang, but I haven't quite figured out the right combination. The function doesn't work when I substitute (. . .) for objective thanks to NSE. I've also tried variations of rlang::as_name() and rlang::get_env(). There was an outdated workbook that used purrr::map() as well, but I haven't had any luck implementing it here.

I either got Error: Column x is not found. or Error: Promise has already been forced

Here is a reproducible example with data

dat <- tibble(category = rep(letters[1:10], 2),
              intake = factor(c(rep("no", 12), rep("yes", 8))), 
              outtake = factor(c(rep("yes", 11), rep("no", 9))),
              pretake = factor(c(rep(c("no", "yes"), 10))))

yessum <- function(.data, objective) {
  .data %>%
    dplyr::mutate("{{objective}}_count" := sum(
      ifelse(
        unlist(!!rlang::ensym(objective)) == "yes", 1, 0)))
}

dat %>%
  group_by(category) %>%
  yessum(intake)

I want to be able to pass a vector of certain column names to yessum and receive a set of new columns just like intake_new but named outtake_new and pretake_new.

Here's what currently happens when I try that:

vars <- c("intake", "outtake", "pretake")

dat %>%
  group_by(category) %>%
  yessum(vars)

Any and all help is welcome!

AndrewGB · Accepted Answer

You don't necessarily need the function, as you can just mutate across the columns and get sums for each category.

library(tidyverse)

dat %>%
  group_by(category) %>%
  mutate(across(ends_with("take"), .fns = list(count = ~sum(. == "yes"))))

Or if you have a long list, then you can use vars directly in the across statement:

vars <- c("intake", "outtake", "pretake")

dat %>%
  group_by(category) %>%
  mutate(across(vars, .fns = list(count = ~sum(. == "yes"))))

Output

  category intake outtake pretake intake_count outtake_count pretake_count
                                        
 1 a        no     yes     no                 0             2             0
 2 b        no     yes     yes                0             1             2
 3 c        no     yes     no                 1             1             0
 4 d        no     yes     yes                1             1             2
 5 e        no     yes     no                 1             1             0
 6 f        no     yes     yes                1             1             2
 7 g        no     yes     no                 1             1             0
 8 h        no     yes     yes                1             1             2
 9 i        no     yes     no                 1             1             0
10 j        no     yes     yes                1             1             2
11 a        no     yes     no                 0             2             0
12 b        no     no      yes                0             1             2
13 c        yes    no      no                 1             1             0
14 d        yes    no      yes                1             1             2
15 e        yes    no      no                 1             1             0
16 f        yes    no      yes                1             1             2
17 g        yes    no      no                 1             1             0
18 h        yes    no      yes                1             1             2
19 i        yes    no      no                 1             1             0
20 j        yes    no      yes                1             1             2

Use vector of columns in custom dplyr function

Answers (1)

Related Questions