Reputation: 832
My dataset contains sets of three columns: "varn_name", "varn_desc", and "varn_cats", where n is a sequence of consecutive integers from 1. I'd like to order these columns so that they follow this order (i.e. "var1_name", "var1_desc", "var1_name", "var2_name", "var2_desc", etc). n varies between datasets and has no upper limit, so I'm hoping for a solution that accommodates this variability.
Example of input data (with n up to 3):
# unsorted
dat <- structure(list(id_col = 1:2, var2_name = c("A", "B"), var1_cats = c("A",
"B"), var3_desc = c("A", "B"), var3_cats = c("A", "B"), var3_name = c("A",
"B"), var2_cats = c("A", "B"), var1_name = c("A", "B"), var1_desc = c("A",
"B"), var2_desc = c("A", "B")), class = "data.frame", row.names = c(NA,
-2L))
Desired output:
# sorted
#> id_col var1_name var1_desc var1_cats var2_name var2_desc var2_cats var3_name
#> 1 1 A A A A A A A
#> 2 2 B B B B B B B
#> var3_desc var3_cats
#> 1 A A
#> 2 B B
The challenge I can't seem to overcome is that I want the n component (e.g. var1) to be ascending but the suffix component ("_name", "_desc", and "_cats") to not follow alphabetical order. Hence solutions like this one don't quite work for me.
Is there a simple one liner? Ideally using dplyr::relocate()
so I can make use of the .after
argument (in this case, .after = id_col
). I was hoping tidyselect
helpers might work, e.g. relocate(dat, matches("^var\\d_"))
, but I can't find a way.
Thanks!
Upvotes: 2
Views: 115
Reputation: 7540
An alternative tidyselect approach with num_range
and contains
:
library(dplyr)
dat |>
select(
id_col,
num_range("var", range = 1:3, "_name"),
num_range("var", range = 1:3, "_desc"),
num_range("var", range = 1:3, "_cats"),
) |>
select(id_col, contains(as.character(1:3)))
#> id_col var1_name var1_desc var1_cats var2_name var2_desc var2_cats var3_name
#> 1 1 A A A A A A A
#> 2 2 B B B B B B B
#> var3_desc var3_cats
#> 1 A A
#> 2 B B
Created on 2024-04-26 with reprex v2.1.0
Upvotes: 1
Reputation: 34441
Here's one way. Not sure you will get a particularly succinct approach as the column names need to be split and ordered by piece.
library(tidyverse)
nm <- names(dat)[-1]
name_order <- nm[order(parse_number(nm),
match(str_split_i(nm, "_", 2), c("name", "desc", "cats")))]
dat |>
relocate(all_of(name_order), .after = id_col)
id_col var1_name var1_desc var1_cats var2_name var2_desc var2_cats var3_name var3_desc var3_cats
1 1 A A A A A A A A A
2 2 B B B B B B B B B
You can create a custom relocate function that can use tidyselect semantics:
my_relocate <- function(.data, vars, ..., suffix_order = c("name", "desc", "cats")) {
nm <- tidyselect::eval_select({{ vars }}, .data) |> names()
name_order <- nm[order(parse_number(nm),
match(str_split_i(nm, "_", 2), suffix_order))]
relocate(.data, all_of(name_order), ...)
}
dat |>
my_relocate(starts_with("var"), .after = id_col)
id_col var1_name var1_desc var1_cats var2_name var2_desc var2_cats var3_name var3_desc var3_cats
1 1 A A A A A A A A A
2 2 B B B B B B B B B
Upvotes: 3
Reputation: 66490
Not at all a one-liner, so I'm curious to see more concise approaches. I rely here on the order you want within each group to be reverse chronological; if it were more complicated I'd make it an ordered factor.
library(tidyverse)
cols_sorted <- data.frame(colnames = names(dat)) |>
separate(colnames, c("grp", "type")) |>
arrange(grp, desc(type)) |>
unite("colname", grp:type) |>
pull()
dat |>
select({{cols_sorted}})
Upvotes: 4