Eugene Brown
Eugene Brown

Reputation: 4362

splitting a string in which upper case follows lower case in stringr

I have a string vector that looks like this and I'd like to split it up:

str <- c("Fruit LoopsJalapeno Sandwich", "Red Bagel", "Basil LeafBarbeque SauceFried Beef")

str_split(str, '[a-z][A-Z]', n = 3)

[[1]]
[1] "Fruit Loop"       "alapeno Sandwich"

[[2]]
[1] "Red Bagel"

[[3]]
[1] "Basil Lea"    "arbeque Sauc" "ried Beef"

But I need to keep those letters at the end and start of the words.

Upvotes: 3

Views: 251

Answers (2)

hwnd
hwnd

Reputation: 70722

You could also match instead of splitting based off your string.

unlist(regmatches(str, gregexpr('[A-Z][a-z]+ [A-Z][a-z]+', str)))
# [1] "Fruit Loops"       "Jalapeno Sandwich" "Red Bagel"        
# [4] "Basil Leaf"        "Barbeque Sauce"    "Fried Beef" 

Upvotes: 3

Tyler Rinker
Tyler Rinker

Reputation: 109844

Here's 2 approaches in base (you can generalize to stringr if you want).

This one subs out this place with a placeholder and then splits on that.

strsplit(gsub("([a-z])([A-Z])", "\\1SPLITHERE\\2", str), "SPLITHERE")

## [[1]]
## [1] "Fruit Loops"       "Jalapeno Sandwich"
## 
## [[2]]
## [1] "Red Bagel"
## 
## [[3]]
## [1] "Basil Leaf"     "Barbeque Sauce" "Fried Beef"  

This method uses lookaheads and lookbehinds:

strsplit(str, "(?<=[a-z])(?=[A-Z])", perl=TRUE)

## [[1]]
## [1] "Fruit Loops"       "Jalapeno Sandwich"
## 
## [[2]]
## [1] "Red Bagel"
## 
## [[3]]
## [1] "Basil Leaf"     "Barbeque Sauce" "Fried Beef"  

EDIT Generalized to stringr so you can grab 3 pieces if you want

stringr::str_split(gsub("([a-z])([A-Z])", "\\1SPLITHERE\\2", str), "SPLITHERE", 3)

Upvotes: 5

Related Questions