Reputation: 29
I have a string that I would like to split into several strings.
library(stringr)
testString <- "SMITH, Klaus, text, text, SMITH, Samantha, text, text, MUELLER, Klaus, text, text, MUELLER, Klara, text, text"
Whenever a new word is completely capitalised (followed by a comma) it should start a new string. At the end it should look like this:
[1] "VOLZ, Klaus, text, text,"
[2] "MUELLER, Klaus, text, text,"
[3] "MUELLER, Klara, text, text,"
I have tried different code here with strsplit, but I can't get r to say that it should not only search for a letter but a complete word (which can have a different number of letters) and then split the string.
strsplit(testString, "(?!^)(?<=[[:upper:]]{2})", perl=T)
Upvotes: 0
Views: 115
Reputation: 886948
Use a regex lookaround - match one or more space (\\s+
) that precedes one or more uppercase letter followed by a ,
((?=[A-Z]+,)
)
strsplit(testString, "\\s+(?=[A-Z]+,)", perl = TRUE)[[1]]
-output
[1] "SMITH, Klaus, text, text,"
[2] "SMITH, Samantha, text, text,"
[3] "MUELLER, Klaus, text, text,"
[4] "MUELLER, Klara, text, text"
Upvotes: 3