Marco Fumagalli
Marco Fumagalli

Reputation: 2477

Match everyting after second occurence of a word

i have a string in R and i would like to match everything after 2nd occurence of a word using a regex

Ex: return everything after 2nd occurence of is

"This is a string of example. this is what i should get in return".

Expected output

what i should get in return

I've tried something like ([^is]+)(?:is[^is]+){2}$ but it doesn't work.

Thanks.

Upvotes: 0

Views: 477

Answers (3)

moodymudskipper
moodymudskipper

Reputation: 47320

You can use unglue :

txt <- "This is a string of example. this is what i should get in return"

library(unglue)
unglue_vec(txt, "{=.*?} is {=.*?} is {x}")
#> [1] "what i should get in return"

Created on 2020-02-26 by the reprex package (v0.3.0)

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

You may use a PCRE pattern like

^(?>.*?\sis\s+){2}\K.*

See the regex demo

Details

  • ^ - start of string
  • (?>.*?\\sis\\s+){2} - an atomic group matching two occurrences of:
    • .* - any 0+ chars other than line break chars, as many as possible
    • \s - a whitespace
    • is - a word is
    • \s+ - 1+ whitespaces
  • \K - match reset operator
  • .* - the rest of the line.

R demo:

x <- "This is a string of example. this is what i should get in return"
regmatches(x, regexpr("^(?>.*?\\sis\\s+){2}\\K.*", x, perl=TRUE))
## => [1] "what i should get in return"

With stringr:

stringr::str_match(x, "^(?>.*?\\sis\\s+){2}(.*)")[,2]

Upvotes: 1

Andrew
Andrew

Reputation: 5138

Using the stringr package you could use str_locate_all() with str_sub(). This extracts where the second instance ([2,) of the s in "is" ([, 2]). And adds one (+ 1) so it starts one character to the right of where "is" ends.

str_sub(text, str_locate_all(text, "\\bis\\b")[[1]][2, 2] + 1)
[1] " what i should get in return"

Data:

text <- "This is a string of example. this is what i should get in return"

Upvotes: 2

Related Questions