pyg
pyg

Reputation: 832

dynamic column relocating using tidyselect

My dataset contains sets of three columns: "varn_name", "varn_desc", and "varn_cats", where n is a sequence of consecutive integers from 1. I'd like to order these columns so that they follow this order (i.e. "var1_name", "var1_desc", "var1_name", "var2_name", "var2_desc", etc). n varies between datasets and has no upper limit, so I'm hoping for a solution that accommodates this variability.

Example of input data (with n up to 3):

# unsorted
dat <- structure(list(id_col = 1:2, var2_name = c("A", "B"), var1_cats = c("A", 
"B"), var3_desc = c("A", "B"), var3_cats = c("A", "B"), var3_name = c("A", 
"B"), var2_cats = c("A", "B"), var1_name = c("A", "B"), var1_desc = c("A", 
"B"), var2_desc = c("A", "B")), class = "data.frame", row.names = c(NA, 
-2L))

Desired output:

# sorted

#>   id_col var1_name var1_desc var1_cats var2_name var2_desc var2_cats var3_name
#> 1      1         A         A         A         A         A         A         A
#> 2      2         B         B         B         B         B         B         B
#>   var3_desc var3_cats
#> 1         A         A
#> 2         B         B

The challenge I can't seem to overcome is that I want the n component (e.g. var1) to be ascending but the suffix component ("_name", "_desc", and "_cats") to not follow alphabetical order. Hence solutions like this one don't quite work for me.

Is there a simple one liner? Ideally using dplyr::relocate() so I can make use of the .after argument (in this case, .after = id_col). I was hoping tidyselect helpers might work, e.g. relocate(dat, matches("^var\\d_")), but I can't find a way.

Thanks!

Upvotes: 2

Views: 115

Answers (3)

Carl
Carl

Reputation: 7540

An alternative tidyselect approach with num_range and contains:

library(dplyr)

dat |> 
  select(
    id_col,
    num_range("var", range = 1:3, "_name"),
    num_range("var", range = 1:3, "_desc"),
    num_range("var", range = 1:3, "_cats"),
    ) |>
  select(id_col, contains(as.character(1:3)))
#>   id_col var1_name var1_desc var1_cats var2_name var2_desc var2_cats var3_name
#> 1      1         A         A         A         A         A         A         A
#> 2      2         B         B         B         B         B         B         B
#>   var3_desc var3_cats
#> 1         A         A
#> 2         B         B

Created on 2024-04-26 with reprex v2.1.0

Upvotes: 1

lroha
lroha

Reputation: 34441

Here's one way. Not sure you will get a particularly succinct approach as the column names need to be split and ordered by piece.

library(tidyverse)

nm <- names(dat)[-1]
name_order <- nm[order(parse_number(nm),
                       match(str_split_i(nm, "_", 2), c("name", "desc", "cats")))]

dat |>
  relocate(all_of(name_order), .after = id_col)

  id_col var1_name var1_desc var1_cats var2_name var2_desc var2_cats var3_name var3_desc var3_cats
1      1         A         A         A         A         A         A         A         A         A
2      2         B         B         B         B         B         B         B         B         B

You can create a custom relocate function that can use tidyselect semantics:

my_relocate <- function(.data, vars, ..., suffix_order = c("name", "desc", "cats")) {
  nm <- tidyselect::eval_select({{ vars }}, .data) |> names()
  name_order <- nm[order(parse_number(nm),
                         match(str_split_i(nm, "_", 2), suffix_order))]
  relocate(.data, all_of(name_order), ...)
}


dat |>
  my_relocate(starts_with("var"), .after = id_col)

  id_col var1_name var1_desc var1_cats var2_name var2_desc var2_cats var3_name var3_desc var3_cats
1      1         A         A         A         A         A         A         A         A         A
2      2         B         B         B         B         B         B         B         B         B

Upvotes: 3

Jon Spring
Jon Spring

Reputation: 66490

Not at all a one-liner, so I'm curious to see more concise approaches. I rely here on the order you want within each group to be reverse chronological; if it were more complicated I'd make it an ordered factor.

library(tidyverse)
cols_sorted <- data.frame(colnames = names(dat)) |>
  separate(colnames, c("grp", "type")) |>
  arrange(grp, desc(type)) |>
  unite("colname", grp:type) |>
  pull()

dat |>
  select({{cols_sorted}})

Upvotes: 4

Related Questions