Reputation: 44658
YARQ (Yet another regex question).
How would I go about splitting the following into two columns, making sure that the last column contains the last word in the sentence and the first column contains everything else.
x <- c("This is a test",
"Testing 1,2,3 Hello",
"Foo Bar",
"Random 214274(%*(^(* Sample",
"Some Hyphenated-Thing"
)
Such that I end up with:
col1 col2
this is a test
Testing 1,2,3 Hello
Foo Bar
Random 214274(%*(^(* Sample
Some Hyphenated-Thing
Upvotes: 5
Views: 522
Reputation:
This might not exactly be for you, but in case anyone was wondering how to do this in python:
#col1:
print line.split(" ")[:-1]
#col2:
print line.split(" ")[-1]
Note that col1 will get printed as a list, which you can make into a string like this:
#col1:
print " ".join(line.split(" ")[:-1])
Upvotes: 0
Reputation: 93908
Here's a go using strsplit
:
do.call(rbind,
lapply(
strsplit(x," "),
function(y)
cbind(paste(head(y,length(y)-1),collapse=" "),tail(y,1))
)
)
Or an alternative implementation using sapply
t(
sapply(
strsplit(x," "),
function(y) cbind(paste(head(y,length(y)-1),collapse=" "),tail(y,1))
)
)
Resulting in:
[,1] [,2]
[1,] "This is a" "test"
[2,] "Testing 1,2,3" "Hello"
[3,] "Foo" "Bar"
[4,] "Random 214274(%*(^(*" "Sample"
[5,] "Some" "Hyphenated-Thing"
Upvotes: 4
Reputation: 15405
This looks like a job for look ahead. We'll find spaces followed by things which are not spaces.
split <- strsplit(x, " (?=[^ ]+$)", perl=TRUE)
matrix(unlist(split), ncol=2, byrow=TRUE)
[,1] [,2]
[1,] "This is a" "test"
[2,] "Testing 1,2,3" "Hello"
[3,] "Foo" "Bar"
[4,] "Random 214274(%*(^(*" "Sample"
[5,] "Some" "Hyphenated-Thing"
Upvotes: 9
Reputation: 60150
Assuming "words" are alphanumeric (the last word in this case is one or letters \\w
or digits \\d
, you can add more classes if necessary):
col_one = gsub("(.*)(\\b[[\\w\\d]+)$", "\\1", x, perl=TRUE)
col_two = gsub("(.*)(\\b[[\\w\\d]+)$", "\\2", x, perl=TRUE)
Output:
> col_one
[1] "This is a " "Testing 1,2,3 " "Foo "
[4] "Random 214274(%*(^(* "
> col_two
[1] "test" "Hello" "Bar" "Sample"
Upvotes: 1