R - select nth occurence of regex match

Question

I'm stumped on a regex pattern where I need to select the 2nd whitespace of a string. I have tried the first 5 pages of google and all I come up with is selecting everything up until the 2nd whitespace, I just want to select the 2nd whitespace itself.

This is what I have so far.

txt <- "the duck is yellow"
str_extract(txt,"(?:[\w]*)(?:[\s])(?:[\w]*)([\s])")

Another regex I tried was:

(\w+\s\w+\s){2}

I am just not able to find a source that explains how to get the second occurrence of a certain character. I thought something like this would be simple.

Ultimately I want to split the text into 2 columns at the second whitespace.

Ronak Shah · Accepted Answer

To divide data into two columns splitting on second whitespace you can try using tidyr::extract.

df <- data.frame(txt = "the duck is yellow")
tidyr::extract(df, txt, c('first', 'second'), '(\w+\s\w+)\s(.*)')

#     first    second
#1 the duck is yellow

Or with strcapture using base R :

strcapture('(\w+\s\w+)\s(.*)', df$txt, 
           proto = list(first = character(), second = character()))

R - select nth occurence of regex match

Answers (2)

Related Questions