Toni Massó Jou
Toni Massó Jou

Reputation: 1

Regular expression on separate function of Tidyr

I need separate two columns with tidyr.

The column have text like: I am Sam. I mean the text always have only two white spaces, and the text can have all other symbols: [a-z][0-9][!\ºª, etc...].

The problem is I need split it in two columns: Column one I am, and column two: Sam.

I can't find a regular expression two separate with the second blank space.

Could you help me please?

Upvotes: 0

Views: 1185

Answers (2)

lukeA
lukeA

Reputation: 54287

As an alternative, given:

library(tidyr)
df <- data.frame(txt = "I am Sam")

you can use

separate(, txt, c("a", "b"), sep="(?<=\\s\\S{1,100})\\s") 
#      a   b
# 1 I am Sam

with separate using stringi::stri_split_regex (ICU engine), or

separate(df, txt, c("a", "b"), sep="^.*?\\s(*SKIP)(*FAIL)|\\s", perl=TRUE) 

with the older (?) separate using base:strsplit (Perl engine). See also

strsplit("I am Sam", "^.*?\\s(*SKIP)(*FAIL)|\\s", perl=TRUE)
# [[1]]
# [1] "I am" "Sam" 

But it might be a bit "esoterique"...

Upvotes: 3

akrun
akrun

Reputation: 887981

We can use extract from tidyr. We match one or more characters and place it in a capture group ((.*)) followed by one or more space (\\s+) and another capture group that contains only non-white space characters (\\S+) to separate the original column into two columns.

library(tidyr)
extract(df1, Col1, into = c("Col1", "Col2"), "(.*)\\s+(\\S+)")
#   Col1 Col2
#1  I am  Sam
#2 He is  Sam

data

df1 <- data.frame(Col1 = c("I am Sam", "He is Sam"), stringsAsFactors=FALSE)

Upvotes: 4

Related Questions