user1987607
user1987607

Reputation: 2157

R custom ordering of character vector by matching the first character

I'm trying to sort a vector in R.

This is an example of how the vector looks like:

test = c("Xpsomethingelse", "3qsometext", "22qsomeothertext")

Simple sorting results in:

> sort(test)
[1] "22qsomeothertext" "3qsometext"       "Xpsomethingelse" 

However I want to sort in a custom order, based on the first/second character of each string. I have created another vector that represents the order that should be followed

order_custom = c("21","18","13","X","Y","1","2","3","4","5","6","7","8","9","10","11","12","14","15","16","17","19","20","22")

I thought of

test[order(match(test, order_custom))]

But this only matches the complete string, while I'm looking for a match with the start of the string. Everything before the 'p' or 'q' character should be taken into account. The match in regex terms should be this [0-9,X,Y]{1,2} I think. But I don't see how I can sort based on this type of match.

The final result should look like this

[1] "Xpsomethingelse", "3qsometext", "22qsomeothertext"

Upvotes: 1

Views: 625

Answers (3)

GKi
GKi

Reputation: 39647

You can use sub to remove p or q and everything afterwards and then use match and order.

test[order(match(sub("[pq].*", "", test), order_custom))]
#[1] "Xpsomethingelse"  "3qsometext"       "22qsomeothertext"

Upvotes: 1

Peace Wang
Peace Wang

Reputation: 2419

Here is a intutional solution to sort by the rank of h3.

library(data.table)
test = c("22qsomeothertext", "3qsometext", "Xpsomethingelse")
order_custom = c("21","18","13","X","Y","1","2","3","4","5","6","7","8","9","10","11","12","14","15","16","17","19","20","22")

dt <- data.table(test)
dt[,`:=`(h1 = substr(test,1,1),
         h2 = substr(test,1,2)) ]

dt[,h3 := fcase(h1 %in% order_custom & !(h2 %in% order_custom), h1,
                h1 %in% order_custom & (h1 %in% order_custom), h2,
                default = NA)]

dt[,rank := match(h3, order_custom)][]
#>                test h1 h2 h3 rank
#> 1: 22qsomeothertext  2 22 22   24
#> 2:       3qsometext  3 3q  3    8
#> 3:  Xpsomethingelse  X Xp  X    4

desired_string <-  dt[order(rank),test]

Created on 2021-07-15 by the reprex package (v2.0.0)

Upvotes: 0

Claudio
Claudio

Reputation: 1528

You can use your original code providing a regular expressions to match whatever comes before a "p" or a "q":

library(stringi)

test = c("Xpsomethingelse", "3qsometext", "22qsomeothertext")

order_custom = c("21","18","13","X","Y","1","2","3","4","5","6","7","8","9","10","11","12","14","15","16","17","19","20","22")

test[order(match(stri_extract(test, regex=".+(?=[p|q])"), order_custom))]
#> [1] "Xpsomethingelse"  "3qsometext"       "22qsomeothertext"

Created on 2021-07-15 by the reprex package (v2.0.0)

Upvotes: 1

Related Questions