binga30
binga30

Reputation: 59

R - select nth occurence of regex match

I'm stumped on a regex pattern where I need to select the 2nd whitespace of a string. I have tried the first 5 pages of google and all I come up with is selecting everything up until the 2nd whitespace, I just want to select the 2nd whitespace itself.

This is what I have so far.

txt <- "the duck is yellow"
str_extract(txt,"(?:[\\w]*)(?:[\\s])(?:[\\w]*)([\\s])")

Another regex I tried was:

(\w+\s\w+\s){2}

I am just not able to find a source that explains how to get the second occurrence of a certain character. I thought something like this would be simple.

Ultimately I want to split the text into 2 columns at the second whitespace.

Upvotes: 0

Views: 838

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388862

To divide data into two columns splitting on second whitespace you can try using tidyr::extract.

df <- data.frame(txt = "the duck is yellow")
tidyr::extract(df, txt, c('first', 'second'), '(\\w+\\s\\w+)\\s(.*)')

#     first    second
#1 the duck is yellow

Or with strcapture using base R :

strcapture('(\\w+\\s\\w+)\\s(.*)', df$txt, 
           proto = list(first = character(), second = character()))

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520958

To split the word at the second space, I might suggest using sub here:

txt <- "the duck is yellow"
first <- sub("^(\\w+ \\w+).*$", "\\1", txt)
second <- sub("^\\w+ \\w+\\s*", "", txt)
first
[1] "the duck"

second
[1] "is yellow"

But this approach could get unwieldy if you needed to split at the nth space, buried somewhere inside the string. For a more general approach, we can try using strsplit, and then piece together the terms:

parts <- strsplit(txt, " ")
pos <- 2
first <- paste(parts[[1]][1:pos], collapse=" ")                        # "the duck"
second <- paste(parts[[1]][(pos+1):length(parts[[1]])], collapse=" ")  # "is yellow"

Upvotes: 1

Related Questions