Brad Cannell
Brad Cannell

Reputation: 3200

Arrange calculated columns immediately after source columns when using dplyr column-wise operations

I'm trying to arrange calculated columns immediately after source columns when using dplyr column-wise operations

library(dplyr)

Example data

df <- tibble(
  id         = c(1, 2, 2),
  id_row     = c(1, 1, 2),
  name_first = c("John", "Jane", "Jane"),
  city       = c("NY", "DAL", "HOU"),
  x          = c(0, 1, 0)
)
# A tibble: 3 x 5
     id id_row name_first city      x
  <dbl>  <dbl> <chr>      <chr> <dbl>
1     1      1 John       NY        0
2     2      1 Jane       DAL       1
3     2      2 Jane       HOU       0

Within ID, the values of name_first and city should be constant. The values for id_row and x need not be constant. I want to test for and inspect differing values of name_first and city within-id.

One convenient way to do this is with mutate() and across().

df %>% 
  group_by(id) %>% 
  mutate(
    across(
      .cols  = c(name_first, city),
      .fns   = ~ length(unique(.x)) == 1,
      .names = "{col}_all_match"
    )
  )
# A tibble: 3 x 7
# Groups:   id [2]
     id id_row name_first city      x name_first_all_match city_all_match
  <dbl>  <dbl> <chr>      <chr> <dbl> <lgl>                <lgl>         
1     1      1 John       NY        0 TRUE                 TRUE          
2     2      1 Jane       DAL       1 TRUE                 FALSE         
3     2      2 Jane       HOU       0 TRUE                 FALSE

The issue is that the "all_match" (calculated) columns are added to the far right of the data frame rather than immediately after their source column. This makes it difficult to visually inspect differing values for the columns of interest.

Of course, in this small data frame, we could easily rearrange the columns using select().

df %>% 
  group_by(id) %>% 
  mutate(
    across(
      .cols  = c(name_first, city),
      .fns   = ~ length(unique(.x)) == 1,
      .names = "{col}_all_match"
    )
  ) %>% 
  select(id, id_row, starts_with("name_first"), starts_with("city"), x)
# A tibble: 3 x 7
# Groups:   id [2]
     id id_row name_first name_first_all_match city  city_all_match     x
  <dbl>  <dbl> <chr>      <lgl>                <chr> <lgl>          <dbl>
1     1      1 John       TRUE                 NY    TRUE               0
2     2      1 Jane       TRUE                 DAL   FALSE              1
3     2      2 Jane       TRUE                 HOU   FALSE              0

The issue with that approach is that it quickly becomes pretty cumbersome with more columns. A more tractable approach would be to sort the names alphabetically...

df %>% 
  group_by(id) %>% 
  mutate(
    across(
      .cols  = c(name_first, city),
      .fns   = ~ length(unique(.x)) == 1,
      .names = "{col}_all_match"
    )
  ) %>% 
  select(sort(names(.)))
# A tibble: 3 x 7
# Groups:   id [2]
  city  city_all_match    id id_row name_first name_first_all_match     x
  <chr> <lgl>          <dbl>  <dbl> <chr>      <lgl>                <dbl>
1 NY    TRUE               1      1 John       TRUE                     0
2 DAL   FALSE              2      1 Jane       TRUE                     1
3 HOU   FALSE              2      2 Jane       TRUE                     0

...but in my situation I need to preserve the original column order. I'd also prefer to stick with Tidyverse solutions if possible.

Any ideas are appreciated!

Also posted on RStudio Community

Upvotes: 1

Views: 466

Answers (1)

LMc
LMc

Reputation: 18682

Since you (obviously) know which columns you want to mutate, you can leverage that ahead of time to create a column order:

cols <- c("name_first", "city")
col_order <- unlist(lapply(names(df), function(x) if (x %in% cols) c(x, paste0(x, "_all_match")) else x))

df %>% 
  group_by(id) %>% 
  mutate(
    across(
      .cols  = cols,
      .fns   = ~ length(unique(.x)) == 1,
      .names = "{col}_all_match"
    )
  ) %>% 
  dplyr::select(col_order)

Output

     id id_row name_first name_first_all_match city  city_all_match     x
  <dbl>  <dbl> <chr>      <lgl>                <chr> <lgl>          <dbl>
1     1      1 John       TRUE                 NY    TRUE               0
2     2      1 Jane       TRUE                 DAL   FALSE              1
3     2      2 Jane       TRUE                 HOU   FALSE              0

This is an issue that's been raised before in mutate and there exists an experimental .after and .before that works with a single variable. When these arguments are used with across, all newly created variables will be positioned together before or after the variable specified, which is not what you want.

Though I think it would be helpful to be able to insert a column after its source, there isn't really tidyverse functionality for this.

Upvotes: 1

Related Questions