Reputation: 1568
I have read a few Q&As on removing and splitting strings, but what I have not come across is removing a section by splitting w/ a specific character, when that character is used more than once in a string. For instance,
V <- c("TUAA_2124_5733", "GAMS_1236_4767")
V1 <- sapply(strsplit(V, split='_', fixed=TRUE), function(x) (x[2]))
V1
V1 [1] "2124" "1236"
This section removes the first section separated by the underscore and the last section as well.
sapply(strsplit(V, split='_', fixed=TRUE), function(x) (x[2]))
How can I keep the last two sections (2124_5733 & 1236_4767), separated by the underscore, while removing only the first section (TUAA & GAMS).
Thanks!
Upvotes: 2
Views: 1081
Reputation: 37641
gsub
will do this with the right regular expression.
gsub("^.*?_", "", V)
[1] "2124_5733" "1236_4767"
This expression can be understood like this:
The initial ^ means the beginning of the string.
. means any character, and .* means zero or more instances of any character.
However, the default is "greedy matching", so .* would match all characters up to the last _. We want the first one, so we use .*?
which suppresses the greedy matching and will only match up to the first _
. So putting it all together, ^.*?_
starts at the beginning of the string, matches any number of characters up to and including the first _. These are replaced with nothing.
Upvotes: 4
Reputation: 61
Hope the below code helps --
sub(pattern = "\\w{1,4}_", replacement = "", V)
Upvotes: 1