Starbucks
Starbucks

Reputation: 1568

Remove Specific Part of String when Multiple Split Characters are used

I have read a few Q&As on removing and splitting strings, but what I have not come across is removing a section by splitting w/ a specific character, when that character is used more than once in a string. For instance,

 V <- c("TUAA_2124_5733", "GAMS_1236_4767")
 V1 <- sapply(strsplit(V, split='_', fixed=TRUE), function(x) (x[2]))
 V1
 V1 [1] "2124" "1236"

This section removes the first section separated by the underscore and the last section as well.

 sapply(strsplit(V, split='_', fixed=TRUE), function(x) (x[2]))

How can I keep the last two sections (2124_5733 & 1236_4767), separated by the underscore, while removing only the first section (TUAA & GAMS).

Thanks!

Upvotes: 2

Views: 1081

Answers (2)

G5W
G5W

Reputation: 37641

gsub will do this with the right regular expression.

gsub("^.*?_", "", V)
[1] "2124_5733" "1236_4767"

This expression can be understood like this:

The initial ^ means the beginning of the string.
. means any character, and .* means zero or more instances of any character. However, the default is "greedy matching", so .* would match all characters up to the last _. We want the first one, so we use .*? which suppresses the greedy matching and will only match up to the first _. So putting it all together, ^.*?_ starts at the beginning of the string, matches any number of characters up to and including the first _. These are replaced with nothing.

Upvotes: 4

Toufiq
Toufiq

Reputation: 61

Hope the below code helps --

sub(pattern = "\\w{1,4}_", replacement = "", V)

Upvotes: 1

Related Questions