Andrei Constantinescu
Andrei Constantinescu

Reputation: 35

Sort a vector using a "personalised" order in R

I want to sort a character vector called c:

c<-c("AD 2017", "AD 2018 ","RT 2017","BL 2017","BL 2018","CT 2018"

If I use R's built in function sort, this is what I get:

> sort(c)
[1] "AD 2017"  "AD 2018 " "BL 2017"  "BL 2018"  "CT 2018"  "RT 2017"

However, let's say I have a different ordering system for the values, which is kept in a matrix and looks like this:

  ORDER VALUE
1     1    RT
2     2    BL
3     3    AD
4     4    CT

The question is how can I sort my "c" vector so it uses the order from the matrix, all while taking into account the different years; my "custom" sorted vector should look like this:

> special_sort(c)
[1] "RT 2017" , "BL 2017" , "BL 2018", "AD 2017" , "AD 2018 " , "CT 2018"

I really need to find a way to automate this, as my database is quite large.

Thank you in advance for your help

Upvotes: 2

Views: 358

Answers (4)

Esben Eickhardt
Esben Eickhardt

Reputation: 3872

Data

vector <- c("AD 2017", "AD 2018 ","RT 2017","BL 2017","BL 2018","CT 2018")

Ordering Function

order_fun <- function(vector) {
  df <- data.frame(do.call(rbind, strsplit(vector, " ")))
  df$X1 <- factor(df$X1, levels = c("RT", "BL", "AD", "CT"), labels = c("RT", "BL", "AD", "CT"))
  df <- df[order(df$X1, df$X2), ]
  vector_ordered <- vector[as.numeric(row.names(df))]
  return(vector_ordered)
}

Ordering one-liner

vector[order(factor(substr(vector,1,2), levels = c("RT", "BL", "AD", "CT")), substr(vector,4,7))]

Results

order_fun(vector)
[1] "RT 2017"  "BL 2017"  "BL 2018"  "AD 2017"  "AD 2018 " "CT 2018" 

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 102890

Not sure if you have year also taken into account when encountering the same alphabet prefix. If you have, then the following can help you

res <- c[order(
  match(gsub("([[:alpha:]]+).*","\\1",v),df$VALUE),
  sort(as.numeric(gsub(".*?([[:digit:]]+)","\\1",v))))]

which gives

> res
[1] "RT 2017"  "BL 2017"  "BL 2018"  "AD 2017" 
[5] "AD 2018 " "CT 2018" 

Otherwise, c[order(match(gsub("([[:alpha:]]+).*","\\1",v),df$VALUE))] is enough for use if you only care about the order of df$VALUE

DATA:

df <- structure(list(ORDER = 1:4, VALUE = c("RT", "BL", "AD", "CT")), class = "data.frame", row.names = c("1", 
"2", "3", "4"))

Upvotes: 1

s__
s__

Reputation: 9525

You can try something like this:

# order it by the first two characters, using the levels of factor choosen
v[order(factor(substr(v,1,2),levels = c("RT","BL","AD","CT")))]
[1] "RT 2017"  "BL 2017"  "BL 2018"  "AD 2017"  "AD 2018 " "CT 2018"

So with a matrix:

# use the second column of the matrix in unique(), to order
v[order(factor(substr(v,1,2),levels = unique(mat[,2])))]
[1] "RT 2017"  "BL 2017"  "BL 2018"  "AD 2017"  "AD 2018 " "CT 2018" 

With vector and matrix:

# your vector
v<-c("AD 2017", "AD 2018 ","RT 2017","BL 2017","BL 2018","CT 2018")

# your matrix
mat <- structure(c("1", "2", "3", "4", "RT", "BL", "AD", "CT"), .Dim = c(4L, 
2L), .Dimnames = list(c("1", "2", "3", "4"), c("ORDER", "VALUE"
)))

Upvotes: 2

tmfmnk
tmfmnk

Reputation: 40171

Another option could be:

x[order(match(substr(x, 1, 2), df$VALUE))]

[1] "RT 2017"  "BL 2017"  "BL 2018"  "AD 2017"  "AD 2018 " "CT 2018" 

Sample data:

x <- c("AD 2017", "AD 2018 ","RT 2017","BL 2017","BL 2018","CT 2018")

df <- read.table(text = "  ORDER VALUE
1     1    RT
                 2     2    BL
                 3     3    AD
                 4     4    CT",
                 header = TRUE,
                 stringsAsFactors = FALSE)

Upvotes: 2

Related Questions