Reputation: 4362
I have a string vector that looks like this and I'd like to split it up:
str <- c("Fruit LoopsJalapeno Sandwich", "Red Bagel", "Basil LeafBarbeque SauceFried Beef")
str_split(str, '[a-z][A-Z]', n = 3)
[[1]]
[1] "Fruit Loop" "alapeno Sandwich"
[[2]]
[1] "Red Bagel"
[[3]]
[1] "Basil Lea" "arbeque Sauc" "ried Beef"
But I need to keep those letters at the end and start of the words.
Upvotes: 3
Views: 251
Reputation: 70722
You could also match instead of split
ting based off your string.
unlist(regmatches(str, gregexpr('[A-Z][a-z]+ [A-Z][a-z]+', str)))
# [1] "Fruit Loops" "Jalapeno Sandwich" "Red Bagel"
# [4] "Basil Leaf" "Barbeque Sauce" "Fried Beef"
Upvotes: 3
Reputation: 109844
Here's 2 approaches in base (you can generalize to stringr if you want).
This one subs out this place with a placeholder and then splits on that.
strsplit(gsub("([a-z])([A-Z])", "\\1SPLITHERE\\2", str), "SPLITHERE")
## [[1]]
## [1] "Fruit Loops" "Jalapeno Sandwich"
##
## [[2]]
## [1] "Red Bagel"
##
## [[3]]
## [1] "Basil Leaf" "Barbeque Sauce" "Fried Beef"
This method uses lookaheads and lookbehinds:
strsplit(str, "(?<=[a-z])(?=[A-Z])", perl=TRUE)
## [[1]]
## [1] "Fruit Loops" "Jalapeno Sandwich"
##
## [[2]]
## [1] "Red Bagel"
##
## [[3]]
## [1] "Basil Leaf" "Barbeque Sauce" "Fried Beef"
EDIT Generalized to stringr so you can grab 3 pieces if you want
stringr::str_split(gsub("([a-z])([A-Z])", "\\1SPLITHERE\\2", str), "SPLITHERE", 3)
Upvotes: 5