Tom Wenseleers
Tom Wenseleers

Reputation: 7989

Removing string parts between substrings when substrings occur multiple times in R

In a string

string="aaaaaaaaaSTARTbbbbbbbbbbSTOPccccccccSTARTddddddddddSTOPeeeeeee"

I would like to remove all parts that occur between START and STOP, yielding

"aaaaaaaaacccccccceeeeeee"

if I try with gsub("START(.*)STOP","",string) this gives me "aaaaaaaaaeeeeeee" though.

What would be the correct way to do this, allowing for multiple occurrences of START and STOP?

Upvotes: 3

Views: 125

Answers (2)

maloneypatr
maloneypatr

Reputation: 3622

Not nearly as elegant as Ananda's answer, but there are some other ways using the stringr & plyr packages.

library(stringr)
library(plyr)

start <- ldply(str_locate_all(string, 'START'))[1, 1]
end <- ldply(str_locate_all(string, 'STOP'))
end <- end[nrow(end), 2]
expression <- str_sub(string, start, end)
str_replace(string, expression, '')

Upvotes: 0

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193527

Add a ? in there too.

gsub("START.*?STOP", "", string)
# [1] "aaaaaaaaacccccccceeeeeee"

Upvotes: 3

Related Questions