Reputation: 3200
I'm trying to arrange calculated columns immediately after source columns when using dplyr column-wise operations
library(dplyr)
Example data
df <- tibble(
id = c(1, 2, 2),
id_row = c(1, 1, 2),
name_first = c("John", "Jane", "Jane"),
city = c("NY", "DAL", "HOU"),
x = c(0, 1, 0)
)
# A tibble: 3 x 5
id id_row name_first city x
<dbl> <dbl> <chr> <chr> <dbl>
1 1 1 John NY 0
2 2 1 Jane DAL 1
3 2 2 Jane HOU 0
Within ID, the values of name_first and city should be constant. The values for id_row and x need not be constant. I want to test for and inspect differing values of name_first and city within-id.
One convenient way to do this is with mutate()
and across()
.
df %>%
group_by(id) %>%
mutate(
across(
.cols = c(name_first, city),
.fns = ~ length(unique(.x)) == 1,
.names = "{col}_all_match"
)
)
# A tibble: 3 x 7
# Groups: id [2]
id id_row name_first city x name_first_all_match city_all_match
<dbl> <dbl> <chr> <chr> <dbl> <lgl> <lgl>
1 1 1 John NY 0 TRUE TRUE
2 2 1 Jane DAL 1 TRUE FALSE
3 2 2 Jane HOU 0 TRUE FALSE
The issue is that the "all_match" (calculated) columns are added to the far right of the data frame rather than immediately after their source column. This makes it difficult to visually inspect differing values for the columns of interest.
Of course, in this small data frame, we could easily rearrange the columns using select()
.
df %>%
group_by(id) %>%
mutate(
across(
.cols = c(name_first, city),
.fns = ~ length(unique(.x)) == 1,
.names = "{col}_all_match"
)
) %>%
select(id, id_row, starts_with("name_first"), starts_with("city"), x)
# A tibble: 3 x 7
# Groups: id [2]
id id_row name_first name_first_all_match city city_all_match x
<dbl> <dbl> <chr> <lgl> <chr> <lgl> <dbl>
1 1 1 John TRUE NY TRUE 0
2 2 1 Jane TRUE DAL FALSE 1
3 2 2 Jane TRUE HOU FALSE 0
The issue with that approach is that it quickly becomes pretty cumbersome with more columns. A more tractable approach would be to sort the names alphabetically...
df %>%
group_by(id) %>%
mutate(
across(
.cols = c(name_first, city),
.fns = ~ length(unique(.x)) == 1,
.names = "{col}_all_match"
)
) %>%
select(sort(names(.)))
# A tibble: 3 x 7
# Groups: id [2]
city city_all_match id id_row name_first name_first_all_match x
<chr> <lgl> <dbl> <dbl> <chr> <lgl> <dbl>
1 NY TRUE 1 1 John TRUE 0
2 DAL FALSE 2 1 Jane TRUE 1
3 HOU FALSE 2 2 Jane TRUE 0
...but in my situation I need to preserve the original column order. I'd also prefer to stick with Tidyverse solutions if possible.
Any ideas are appreciated!
Also posted on RStudio Community
Upvotes: 1
Views: 466
Reputation: 18682
Since you (obviously) know which columns you want to mutate, you can leverage that ahead of time to create a column order:
cols <- c("name_first", "city")
col_order <- unlist(lapply(names(df), function(x) if (x %in% cols) c(x, paste0(x, "_all_match")) else x))
df %>%
group_by(id) %>%
mutate(
across(
.cols = cols,
.fns = ~ length(unique(.x)) == 1,
.names = "{col}_all_match"
)
) %>%
dplyr::select(col_order)
Output
id id_row name_first name_first_all_match city city_all_match x
<dbl> <dbl> <chr> <lgl> <chr> <lgl> <dbl>
1 1 1 John TRUE NY TRUE 0
2 2 1 Jane TRUE DAL FALSE 1
3 2 2 Jane TRUE HOU FALSE 0
This is an issue that's been raised before in mutate
and there exists an experimental .after
and .before
that works with a single variable. When these arguments are used with across
, all newly created variables will be positioned together before or after the variable specified, which is not what you want.
Though I think it would be helpful to be able to insert a column after its source, there isn't really tidyverse functionality for this.
Upvotes: 1