Andrei Maiseyeu
Andrei Maiseyeu

Reputation: 33

Reorder and Group Multiple Columns by Regex/pattern

I have the following:

a_aa a_ab a_ac b_aa b_ab b_ac
2    3    3    3     1    2
3    4    1    1     3    1

Desired outcome:

a_aa b_aa a_ab b_ab a_ac b_ac
2    3    3    1     3    2
3    1    4    3     1    1

Code with data:

d <- "a_aa a_ab a_ac b_aa   b_ab b_ac
2    3    3    3     1    2
3    4    1    1     3    1"
dd <- read.table(textConnection(object = d), header = T)

My current solution is manual:

    dd %>% select(a_aa, b_aa, a_ab, b_ab, a_ac, b_ac)

however, is onerous when number of columns is large. Any ideas how to do this kind of column ordering with grouping (e.g. sequence a_etc1, b_etc1, a_etc2, b_etc2)? Thank you!

Upvotes: 3

Views: 412

Answers (4)

hello_friend
hello_friend

Reputation: 5788

Base R solution:

dd[,order(gsub(".*\\_(\\w+$)", "\\1", names(dd)))]

Upvotes: 1

TarJae
TarJae

Reputation: 78917

Here is an alternative way:

We first create a function using two additional packages:

  1. stringi for its str_reverse function (advantage:it is vectorized)
  2. gtools for its mixedsort function

Then we apply it to our dataframe df with rename_with:

library(dplyr)

# the function
my_function <- function(df){
  x <- stringi::stri_reverse(colnames(df))
  y <- gtools::mixedsort(x)
  stringi::stri_reverse(y)
}


df %>% 
  rename_with(., ~my_function(df))
  a_aa b_aa a_ab b_ab a_ac b_ac
1    2    3    3    3    1    2
2    3    4    1    1    3    1

Upvotes: 1

B. Christian Kamgang
B. Christian Kamgang

Reputation: 6489

Here is one way to solve your problem:

dd[order(gsub(".+_", "", names(dd)))]

# or

dd %>%
  select(order(gsub(".+_", "", names(.))))


  a_aa b_aa a_ab b_ab a_ac b_ac
1    2    3    3    1    3    2
2    3    1    4    3    1    1

Upvotes: 6

AnilGoyal
AnilGoyal

Reputation: 26218

You may do something like this

library(tidyverse)

d <- "a_aa  a_ab    a_ac    b_aa    b_ab    b_ac
2   3   1   3   3   2
3   1   3   4   1   1"
dd <- read.table(textConnection(object = d), header = T)

colnames(dd) %>% 
  str_split("_") %>% 
  map_chr(~.x[2]) %>% 
  unique() -> vars

dd %>% 
  select(ends_with(all_of(vars)))
#>   a_aa b_aa a_ab b_ab a_ac b_ac
#> 1    2    3    3    3    1    2
#> 2    3    4    1    1    3    1

If you don't want to use other tidyverse libraries than dplyr, you can do

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

d <- "a_aa  a_ab    a_ac    b_aa    b_ab    b_ac
2   3   1   3   3   2
3   1   3   4   1   1"
dd <- read.table(textConnection(object = d), header = T)

colnames(dd) %>% 
  strsplit("_") %>% 
  sapply(\(.x) .x[2]) %>% 
  unique() -> vars

dd %>% 
  select(ends_with(all_of(vars)))
#>   a_aa b_aa a_ab b_ab a_ac b_ac
#> 1    2    3    3    3    1    2
#> 2    3    4    1    1    3    1

Created on 2022-07-10 by the reprex package (v2.0.1)

Upvotes: 1

Related Questions