Reputation: 59
I'm stumped on a regex pattern where I need to select the 2nd whitespace of a string. I have tried the first 5 pages of google and all I come up with is selecting everything up until the 2nd whitespace, I just want to select the 2nd whitespace itself.
This is what I have so far.
txt <- "the duck is yellow"
str_extract(txt,"(?:[\\w]*)(?:[\\s])(?:[\\w]*)([\\s])")
Another regex I tried was:
(\w+\s\w+\s){2}
I am just not able to find a source that explains how to get the second occurrence of a certain character. I thought something like this would be simple.
Ultimately I want to split the text into 2 columns at the second whitespace.
Upvotes: 0
Views: 838
Reputation: 388862
To divide data into two columns splitting on second whitespace you can try using tidyr::extract
.
df <- data.frame(txt = "the duck is yellow")
tidyr::extract(df, txt, c('first', 'second'), '(\\w+\\s\\w+)\\s(.*)')
# first second
#1 the duck is yellow
Or with strcapture
using base R :
strcapture('(\\w+\\s\\w+)\\s(.*)', df$txt,
proto = list(first = character(), second = character()))
Upvotes: 2
Reputation: 520958
To split the word at the second space, I might suggest using sub
here:
txt <- "the duck is yellow"
first <- sub("^(\\w+ \\w+).*$", "\\1", txt)
second <- sub("^\\w+ \\w+\\s*", "", txt)
first
[1] "the duck"
second
[1] "is yellow"
But this approach could get unwieldy if you needed to split at the nth space, buried somewhere inside the string. For a more general approach, we can try using strsplit
, and then piece together the terms:
parts <- strsplit(txt, " ")
pos <- 2
first <- paste(parts[[1]][1:pos], collapse=" ") # "the duck"
second <- paste(parts[[1]][(pos+1):length(parts[[1]])], collapse=" ") # "is yellow"
Upvotes: 1