Vesnič
Vesnič

Reputation: 375

Sort vector of strings by characters from behind the strings

I have a dataframe with a number of repetitive column names separated by a serial number. It looks something like this:

temp <- c("DTA_1", "DTA_2", "DTA_3", "OCI_1", "OCI_2", "OCI_3", "Time_1", "Time_2", "Time_3")

At the end it should look like this

temp <- c("DTA_1", "Time_1", "OCI_1", "DTA_2", "Time_2", "OCI_2", "DTA_3", "Time_3", "OCI_3")

I've started working on it and I came to this:

for(i in 1:length(tt)){
   paste(rev(strsplit(tt[i], "")[[1]]), collapse = "")
}

but then I realized I have to sort them after that and turn all the variables around again... It just seemed dumb and stupid.

Is there a better, more elegant way to do it?

Upvotes: 1

Views: 50

Answers (2)

akrun
akrun

Reputation: 887118

An option is to read it to a 2 column data.frame by specifying the delimiter as _, order the columns and use that index for ordering the vector

temp[do.call(order, transform(read.table(text = temp, header = FALSE, 
    sep="_"), V1 = factor(V1, levels = c("DTA", "Time", "OCI")))[2:1])]
#[1] "DTA_1"  "Time_1" "OCI_1"  "DTA_2"  "Time_2" "OCI_2"  "DTA_3"  "Time_3" "OCI_3" 

Or as @d.b mentioned in the comments, instead of converting to factor, use match and order based on that index

temp[with(read.table(text = temp, sep = "_"), order(V2, 
          match(V1, c("DTA", "Time", "OCI"))))]
#[1] "DTA_1"  "Time_1" "OCI_1"  "DTA_2"  "Time_2" "OCI_2"  "DTA_3"  "Time_3" "OCI_3" 

Or an option in tidyverse

library(tidyverse)
library(forcats)
tibble(temp) %>% 
  separate(temp, into = c('t1', 't2'), convert = TRUE) %>% 
  arrange(t2, fct_relevel(t1, c('DTA', 'Time', 'OCI'))) %>%
  unite(temp, t1, t2, sep="_") %>% 
  pull(temp)
#[1] "DTA_1"  "Time_1" "OCI_1"  "DTA_2"  "Time_2" "OCI_2"  "DTA_3"  "Time_3" "OCI_3" 

Upvotes: 2

d.b
d.b

Reputation: 32548

You can specify the custom order of the strings by converting them to factor and specifying the order in the levels

temp[order(as.numeric(gsub("\\D", "", temp)),
            factor(gsub("_\\d+", "", temp), levels = c("DTA", "Time", "OCI")))]
#[1] "DTA_1"  "Time_1" "OCI_1"  "DTA_2"  "Time_2" "OCI_2"  "DTA_3"  "Time_3" "OCI_3"

Upvotes: 4

Related Questions