Reputation: 35
I want to sort a character vector called c:
c<-c("AD 2017", "AD 2018 ","RT 2017","BL 2017","BL 2018","CT 2018"
If I use R's built in function sort, this is what I get:
> sort(c)
[1] "AD 2017" "AD 2018 " "BL 2017" "BL 2018" "CT 2018" "RT 2017"
However, let's say I have a different ordering system for the values, which is kept in a matrix and looks like this:
ORDER VALUE
1 1 RT
2 2 BL
3 3 AD
4 4 CT
The question is how can I sort my "c" vector so it uses the order from the matrix, all while taking into account the different years; my "custom" sorted vector should look like this:
> special_sort(c)
[1] "RT 2017" , "BL 2017" , "BL 2018", "AD 2017" , "AD 2018 " , "CT 2018"
I really need to find a way to automate this, as my database is quite large.
Thank you in advance for your help
Upvotes: 2
Views: 358
Reputation: 3872
vector <- c("AD 2017", "AD 2018 ","RT 2017","BL 2017","BL 2018","CT 2018")
order_fun <- function(vector) {
df <- data.frame(do.call(rbind, strsplit(vector, " ")))
df$X1 <- factor(df$X1, levels = c("RT", "BL", "AD", "CT"), labels = c("RT", "BL", "AD", "CT"))
df <- df[order(df$X1, df$X2), ]
vector_ordered <- vector[as.numeric(row.names(df))]
return(vector_ordered)
}
vector[order(factor(substr(vector,1,2), levels = c("RT", "BL", "AD", "CT")), substr(vector,4,7))]
order_fun(vector)
[1] "RT 2017" "BL 2017" "BL 2018" "AD 2017" "AD 2018 " "CT 2018"
Upvotes: 1
Reputation: 102890
Not sure if you have year
also taken into account when encountering the same alphabet prefix. If you have, then the following can help you
res <- c[order(
match(gsub("([[:alpha:]]+).*","\\1",v),df$VALUE),
sort(as.numeric(gsub(".*?([[:digit:]]+)","\\1",v))))]
which gives
> res
[1] "RT 2017" "BL 2017" "BL 2018" "AD 2017"
[5] "AD 2018 " "CT 2018"
Otherwise, c[order(match(gsub("([[:alpha:]]+).*","\\1",v),df$VALUE))]
is enough for use if you only care about the order of df$VALUE
DATA:
df <- structure(list(ORDER = 1:4, VALUE = c("RT", "BL", "AD", "CT")), class = "data.frame", row.names = c("1",
"2", "3", "4"))
Upvotes: 1
Reputation: 9525
You can try something like this:
# order it by the first two characters, using the levels of factor choosen
v[order(factor(substr(v,1,2),levels = c("RT","BL","AD","CT")))]
[1] "RT 2017" "BL 2017" "BL 2018" "AD 2017" "AD 2018 " "CT 2018"
So with a matrix:
# use the second column of the matrix in unique(), to order
v[order(factor(substr(v,1,2),levels = unique(mat[,2])))]
[1] "RT 2017" "BL 2017" "BL 2018" "AD 2017" "AD 2018 " "CT 2018"
With vector and matrix:
# your vector
v<-c("AD 2017", "AD 2018 ","RT 2017","BL 2017","BL 2018","CT 2018")
# your matrix
mat <- structure(c("1", "2", "3", "4", "RT", "BL", "AD", "CT"), .Dim = c(4L,
2L), .Dimnames = list(c("1", "2", "3", "4"), c("ORDER", "VALUE"
)))
Upvotes: 2
Reputation: 40171
Another option could be:
x[order(match(substr(x, 1, 2), df$VALUE))]
[1] "RT 2017" "BL 2017" "BL 2018" "AD 2017" "AD 2018 " "CT 2018"
Sample data:
x <- c("AD 2017", "AD 2018 ","RT 2017","BL 2017","BL 2018","CT 2018")
df <- read.table(text = " ORDER VALUE
1 1 RT
2 2 BL
3 3 AD
4 4 CT",
header = TRUE,
stringsAsFactors = FALSE)
Upvotes: 2