How to order a character vector according to a second character vector made up of substrings of the first?

Question

I want to sort a character vector that looks like this:

x <- c("white","white","blue","green","red","blue","red")

according to a specific order that looks like this:

y <- c("r","white","bl","gree")

If the second vector would be spelled out, the answer can be found here. However, in reality my first character vector has very long entries and the second vector has much shorter but still long entries. All entries are of different character length. My goal still is c("red","red","white","white","blue","blue", "green"). I actually only have unique entries in both vectors but I guess the question will be more useful if we have a general answer? How could I approach this?

GKi · Accepted Answer

You can use grep in combination with sapply. But it will only work when there is no overlap in y. It will only return hits between x and y. With ^ you say that it need to be at the begin. value = TRUE says that it should return the string where it has a hit.

unlist(sapply(paste0("^",y), grep, x, value = TRUE))
#    ^r1     ^r2 ^white1 ^white2    ^bl1    ^bl2   ^gree 
#  "red"   "red" "white" "white"  "blue"  "blue" "green"

The following will also work with an overlap in y and takes the first hit.

x  <- c(x, "redd"); y  <- c(y, "redd")

x[unique(unlist(sapply(paste0("^",y), grep, x)))]
#[1] "red"   "red"   "redd"  "white" "white" "blue"  "blue"  "green"

or get the last hit:

x[unique(unlist(sapply(paste0("^",y), grep, x)), fromLast = TRUE)]
[1] "red"   "red"   "white" "white" "blue"  "blue"  "green" "redd"

To get all x and place the no-match and the end you can use:

x  <- c(x, "yellow")

x[unique(c(unlist(sapply(paste0("^",y), grep, x)), seq_along(x)))]
[1] "red"    "red"    "redd"   "white"  "white"  "blue"   "blue"   "green" 
[9] "yellow"

How to order a character vector according to a second character vector made up of substrings of the first?

Answers (1)

Related Questions