Reputation: 1
I need separate two columns with tidyr.
The column have text like: I am Sam
. I mean the text always have only two white spaces, and the text can have all other symbols: [a-z][0-9][!\ºª, etc...]
.
The problem is I need split it in two columns: Column one I am
, and column two: Sam
.
I can't find a regular expression two separate with the second blank space.
Could you help me please?
Upvotes: 0
Views: 1185
Reputation: 54287
As an alternative, given:
library(tidyr)
df <- data.frame(txt = "I am Sam")
you can use
separate(, txt, c("a", "b"), sep="(?<=\\s\\S{1,100})\\s")
# a b
# 1 I am Sam
with separate
using stringi::stri_split_regex
(ICU engine), or
separate(df, txt, c("a", "b"), sep="^.*?\\s(*SKIP)(*FAIL)|\\s", perl=TRUE)
with the older (?) separate
using base:strsplit
(Perl engine). See also
strsplit("I am Sam", "^.*?\\s(*SKIP)(*FAIL)|\\s", perl=TRUE)
# [[1]]
# [1] "I am" "Sam"
But it might be a bit "esoterique"...
Upvotes: 3
Reputation: 887981
We can use extract
from tidyr
. We match one or more characters and place it in a capture group ((.*)
) followed by one or more space (\\s+
) and another capture group that contains only non-white space characters (\\S+
) to separate the original column into two columns.
library(tidyr)
extract(df1, Col1, into = c("Col1", "Col2"), "(.*)\\s+(\\S+)")
# Col1 Col2
#1 I am Sam
#2 He is Sam
df1 <- data.frame(Col1 = c("I am Sam", "He is Sam"), stringsAsFactors=FALSE)
Upvotes: 4