Reputation: 2477
i have a string in R and i would like to match everything after 2nd occurence of a word using a regex
Ex: return everything after 2nd occurence of is
"This is a string of example. this is what i should get in return".
Expected output
what i should get in return
I've tried something like ([^is]+)(?:is[^is]+){2}$
but it doesn't work.
Thanks.
Upvotes: 0
Views: 477
Reputation: 47320
You can use unglue :
txt <- "This is a string of example. this is what i should get in return"
library(unglue)
unglue_vec(txt, "{=.*?} is {=.*?} is {x}")
#> [1] "what i should get in return"
Created on 2020-02-26 by the reprex package (v0.3.0)
Upvotes: 0
Reputation: 626845
You may use a PCRE pattern like
^(?>.*?\sis\s+){2}\K.*
See the regex demo
Details
^
- start of string(?>.*?\\sis\\s+){2}
- an atomic group matching two occurrences of:
.*
- any 0+ chars other than line break chars, as many as possible\s
- a whitespaceis
- a word is
\s+
- 1+ whitespaces\K
- match reset operator.*
- the rest of the line.x <- "This is a string of example. this is what i should get in return"
regmatches(x, regexpr("^(?>.*?\\sis\\s+){2}\\K.*", x, perl=TRUE))
## => [1] "what i should get in return"
With stringr
:
stringr::str_match(x, "^(?>.*?\\sis\\s+){2}(.*)")[,2]
Upvotes: 1
Reputation: 5138
Using the stringr
package you could use str_locate_all()
with str_sub()
. This extracts where the second instance ([2,
) of the s in "is"
([, 2]
). And adds one (+ 1
) so it starts one character to the right of where "is"
ends.
str_sub(text, str_locate_all(text, "\\bis\\b")[[1]][2, 2] + 1)
[1] " what i should get in return"
Data:
text <- "This is a string of example. this is what i should get in return"
Upvotes: 2