compbiostats
compbiostats

Reputation: 961

Extract all text after last occurrence of a special character

I have the string in R

BLCU142-09|Apodemia_mejicanus

and I would like to get the result

Apodemia_mejicanus

Using the stringr R package, I have tried

str_replace_all("BLCU142-09|Apodemia_mejicanus", "[[A-Z0-9|-]]", "")
# [1] "podemia_mejicanus"

which is almost what I need, except that the A is missing.

Upvotes: 2

Views: 2483

Answers (4)

bcarlsen
bcarlsen

Reputation: 1441

I would keep it simple:

substring(my_string, regexpr("|", my_string, fixed = TRUE) + 1L)

Upvotes: 0

Ben Bolker
Ben Bolker

Reputation: 226097

You can always choose to _extract rather than _remove:

s <- "BLCU142-09|Apodemia_mejicanus"
stringr::str_extract(s,"[[:alpha:]_]+$")
## [1] "Apodemia_mejicanus"

Depending on how permissive you want to be, you could also use [[:alpha:]]+_[[:alpha:]]+ as your target.

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626728

You can use

sub(".*\\|", "", x)

This will remove all text up to and including the last pipe char. See the regex demo. Details:

  • .* - any zero or more chars as many as possible
  • \| - a | char (| is a special regex metacharacter that is an alternation operator, so it must be escaped, and since string literals in R can contain string escape sequences, the | is escaped with a double backslash).

See the R demo online:

x <- c("BLCU142-09|Apodemia_mejicanus", "a|b|c|BLCU142-09|Apodemia_mejicanus")
sub(".*\\|", "", x)
## => [1] "Apodemia_mejicanus" "Apodemia_mejicanus"

Upvotes: 5

akrun
akrun

Reputation: 886998

We can match one or more characters that are not a | ([^|]+) from the start (^) of the string followed by | in str_remove to remove that substring

library(stringr)
str_remove(str1, "^[^|]+\\|")
#[1] "Apodemia_mejicanus"

If we use [A-Z] also to match it will match the upper case letter and replace with blank ("") as in the OP's str_replace_all

data

str1 <- "BLCU142-09|Apodemia_mejicanus"

Upvotes: 4

Related Questions