JuusoT
JuusoT

Reputation: 65

Splitting string between capital and lowercase character in R?

I have a vector of character strings:

v1 <- c("Firstname LastnameFirstname Lastname", 
"Firstname Lastname", 
"Firstname Lastname", 
"Firstname LastnameFirstname Lastname")

I'd like to split the string between lowercase letter followed by a capital letter retaining both of the letters.

The desired output would be:

[1] "Firstname Lastname" "Firstname Lastname"   "Firstname Lastname"  "Firstname Lastname"  "Firstname Lastname" "Firstname Lastname"

Following examples in StackExchange I've tried with the strsplit function with gsub:

unlist(strsplit( gsub("([a-z][A-Z])","\\1~",v1), "~" ))

but this does not split between the characters, rather after the regex match for split point:

[1] "Firstname LastnameF" "irstname Lastname"   "Firstname Lastname"  "Firstname Lastname"  "Firstname LastnameF" "irstname Lastname"  

How do I split between the characters still retaining both of the characters?

Upvotes: 4

Views: 1988

Answers (1)

akrun
akrun

Reputation: 886948

We can use regex lookaround to match lower case letters (positive lookbehind - (?<=[a-z])) followed by upper case letters (positive lookahead -(?=[A-Z]))

unlist(strsplit(v1, "(?<=[a-z])(?=[A-Z])", perl = TRUE))
#[1] "Firstname Lastname" "Firstname Lastname" "Firstname Lastname" 
#[4] "Firstname Lastname" "Firstname Lastname" "Firstname Lastname"

Upvotes: 10

Related Questions