asachet
asachet

Reputation: 6921

intersection of dplyr select helpers

I want to specify a selection of columns of a data.frame to dplyr's xxxx_at functions, via the .vars argument. But I want to select the intersection of my selections.

Here is an example: a data.frame with names of the form [abc][abc][abc].

df <- structure(list(aaa = 1L, baa = 2L, caa = 3L, aba = 4L, bba = 5L, 
    cba = 6L, aca = 7L, bca = 8L, cca = 9L, aab = 10L, bab = 11L, 
    cab = 12L, abb = 13L, bbb = 14L, cbb = 15L, acb = 16L, bcb = 17L, 
    ccb = 18L, aac = 19L, bac = 20L, cac = 21L, abc = 22L, bbc = 23L, 
    cbc = 24L, acc = 25L, bcc = 26L, ccc = 27L), class = "data.frame", row.names = c(NA, 
-1L))


# names(df)
# [1] "aaa" "baa" "caa" "aba" "bba" "cba" "aca" "bca" "cca" "aab" "bab" "cab" "abb" "bbb" "cbb" "acb" "bcb"
# [18] "ccb" "aac" "bac" "cac" "abc" "bbc" "cbc" "acc" "bcc" "ccc"

I want to select, in one go, the columns starting with "a" and ending with "c". In order to use the solution with mutate_at, group_by_at, and_so_on_at, it needs to fit inside a single call to vars.

Using several conditions in vars takes the union of them and not the intersection.

df %>% 
select_at(vars(starts_with("a"), end_with("c"))) %>%
names

# [1] "aaa" "aba" "aca" "aab" "abb" "acb" "aac" "abc" "acc" "bac" "cac" "bbc" "cbc" "bcc" "ccc"

I am trying to get:

[1] "aac" "abc" "acc"

I have a feeling all_vars is relevant but I could not figure out how to use it.

PS: I known I can use select instead of select_at but I am trying to be general. My actual use case is with mutate_at.

Upvotes: 3

Views: 732

Answers (2)

Andre Elrico
Andre Elrico

Reputation: 11500

grep("^a.*c$", names(df), value = TRUE)

#[1] "aac" "abc" "acc"

if you insist to use dplyr

df %>% 
    select_at(vars(matches("^a.*c$"))) %>%
    names

#[1] "aac" "abc" "acc"

Upvotes: 4

FloSchmo
FloSchmo

Reputation: 758

starts_with and ends_with both evaluate to column positions. So they both return numbers that represent the column indices. If you want to apply both at the same time you want the intersect of the column indices that are returned by both functions. You can do exactly that by calling intersect on the return values of starts_with and ends_with:

df %>% 
  select_at(vars(intersect(starts_with("a"), ends_with("c")))) %>%
  names

Upvotes: 4

Related Questions