Reputation: 3568
I have a character vector where some First and Surnames are separated with a space and some are not. I need to separate with a space those character strings where First names and last names are not separated. Each names begins with a capital.
e.g. in
x <- c("John Lennon", "GeorgeHarrison", "RingoStarr")
I would like George and Ringo's names to be separated by a space while leaving John's as-is.
After looking on SO I tried
gsub("([[:upper:]][[:lower:]])", "\\1 \\2", x)
but that yielded
"Jo hn Le nnon" "Ri ngoSt arr"
To be honest I don't have a clue what I'm doing when it comes to regular expressions (Just bought a book on it a minute ago on Amazon but can't wait that long).
Help much appreciated
Upvotes: 0
Views: 130
Reputation: 34753
You can use PERL look-ahead:
gsub("([[:lower:]])(?=[[:upper:]])", "\\1 ", x, perl = TRUE)
# [1] "John Lennon" "George Harrison" "Ringo Starr"
Explore this on regex101 for more, and read about look-around regex here.
Upon further inspection of your attempt, you made two crucial mistakes:
[:upper:]
and [:lower:]
You can make slight changes to your own approach:
gsub("([[:lower:]])([[:upper:]])", "\\1 \\2", x)
Upvotes: 2