Reputation: 2157
I'm trying to sort a vector in R.
This is an example of how the vector looks like:
test = c("Xpsomethingelse", "3qsometext", "22qsomeothertext")
Simple sorting results in:
> sort(test)
[1] "22qsomeothertext" "3qsometext" "Xpsomethingelse"
However I want to sort in a custom order, based on the first/second character of each string. I have created another vector that represents the order that should be followed
order_custom = c("21","18","13","X","Y","1","2","3","4","5","6","7","8","9","10","11","12","14","15","16","17","19","20","22")
I thought of
test[order(match(test, order_custom))]
But this only matches the complete string, while I'm looking for a match with the start of the string. Everything before the 'p' or 'q' character should be taken into account. The match in regex terms should be this [0-9,X,Y]{1,2} I think. But I don't see how I can sort based on this type of match.
The final result should look like this
[1] "Xpsomethingelse", "3qsometext", "22qsomeothertext"
Upvotes: 1
Views: 625
Reputation: 39647
You can use sub
to remove p or q and everything afterwards and then use match
and order
.
test[order(match(sub("[pq].*", "", test), order_custom))]
#[1] "Xpsomethingelse" "3qsometext" "22qsomeothertext"
Upvotes: 1
Reputation: 2419
Here is a intutional solution to sort by the rank of h3
.
library(data.table)
test = c("22qsomeothertext", "3qsometext", "Xpsomethingelse")
order_custom = c("21","18","13","X","Y","1","2","3","4","5","6","7","8","9","10","11","12","14","15","16","17","19","20","22")
dt <- data.table(test)
dt[,`:=`(h1 = substr(test,1,1),
h2 = substr(test,1,2)) ]
dt[,h3 := fcase(h1 %in% order_custom & !(h2 %in% order_custom), h1,
h1 %in% order_custom & (h1 %in% order_custom), h2,
default = NA)]
dt[,rank := match(h3, order_custom)][]
#> test h1 h2 h3 rank
#> 1: 22qsomeothertext 2 22 22 24
#> 2: 3qsometext 3 3q 3 8
#> 3: Xpsomethingelse X Xp X 4
desired_string <- dt[order(rank),test]
Created on 2021-07-15 by the reprex package (v2.0.0)
Upvotes: 0
Reputation: 1528
You can use your original code providing a regular expressions to match whatever comes before a "p" or a "q":
library(stringi)
test = c("Xpsomethingelse", "3qsometext", "22qsomeothertext")
order_custom = c("21","18","13","X","Y","1","2","3","4","5","6","7","8","9","10","11","12","14","15","16","17","19","20","22")
test[order(match(stri_extract(test, regex=".+(?=[p|q])"), order_custom))]
#> [1] "Xpsomethingelse" "3qsometext" "22qsomeothertext"
Created on 2021-07-15 by the reprex package (v2.0.0)
Upvotes: 1