Reputation: 640
How can I split the following string?
"Wes Anderson – The Grand Budapest HotelRichard Linklater – BoyhoodBennett Miller – FoxcatcherMorten Tyldum – The Imitation Game"
into:
"Wes Anderson – The Grand Budapest Hotel"
"Richard Linklater – Boyhood"
"Bennett Miller – Foxcatcher"
"Morten Tyldum – The Imitation Game"
The first split point is "HotelRichard" so I think a word containing [a-z][A-Z] could be used to find the rules. But if I substitute those part using:
strsplit("HotelRichard", "[a-z][A-Z]") returns "Hote" "ichard".
Any good ideas for that?
Upvotes: 1
Views: 50
Reputation: 109874
Here's an approach using a single regex (a Lookahead and Lookbehind):
strsplit(x, "(?<=[a-z])(?=[A-Z])", perl = TRUE)
## [[1]]
## [1] "Wes Anderson – The Grand Budapest Hotel"
## [2] "Richard Linklater – Boyhood"
## [3] "Bennett Miller – Foxcatcher"
## [4] "Morten Tyldum – The Imitation Game"
Upvotes: 0
Reputation: 7654
First break apart the director/film mashups, then split the string at the inserted "xxx". The first steps marks two groups and then replaces them with the three x's in between.
text <- "Wes Anderson – The Grand Budapest HotelRichard Linklater – BoyhoodBennett Miller – FoxcatcherMorten Tyldum – The Imitation Game"
text.split <- str_replace_all(text, "([a-z])([A-Z])", "\\1xxx\\2")
text.final <- str_split(text.split, "xxx")
text.final
[[1]]
[1] "Wes Anderson – The Grand Budapest Hotel" "Richard Linklater – Boyhood"
[3] "Bennett Miller – Foxcatcher" "Morten Tyldum – The Imitation Game"
Upvotes: 0
Reputation: 626893
You can try using this code where I am using a kind of a workaround to insert a §
sign (hopefully, it is not that frequent if at all in your input) and then split by it:
x <- "Wes Anderson – The Grand Budapest HotelRichard Linklater – BoyhoodBennett Miller – FoxcatcherMorten Tyldum – The Imitation Game"
x <- gsub("([a-z])([A-Z])","\\1§\\2",x)
strsplit(x,"§")
Sample program output:
[[1]]
[1] "Wes Anderson \342\200\223 The Grand Budapest Hotel"
[2] "Richard Linklater \342\200\223 Boyhood"
[3] "Bennett Miller \342\200\223 Foxcatcher"
[4] "Morten Tyldum \342\200\223 The Imitation Game"
Upvotes: 3