jay.sf
jay.sf

Reputation: 72984

How to use `strsplit` before every capital letter of a camel case?

I want to use strsplit at a pattern before every capital letter and use a positive lookahead. However it also splits after every, and I'm confused about that. Is this regex incompatible with strsplit? Why is that so and what is to change?

strsplit('AaaBbbCcc', '(?=\\p{Lu})', perl=TRUE)[[1]]
strsplit('AaaBbbCcc', '(?=[A-Z])', perl=TRUE)[[1]]
strsplit('AaaBbbCcc', '(?=[ABC])', perl=TRUE)[[1]]
# [1] "A"  "aa" "B"  "bb" "C"  "cc"

Expected result:

# [1] "Aaa" "Bbb" "Ccc"

In the Demo it actually looks fine.

Ideally it should split before every camel case, e.g. Aa and not AA; there's \\p{Lt} but this doesn't seem to work at all.

strsplit('AaaABbbBCcc', '(?=\\p{Lt})', perl=TRUE)[[1]]
# [1] "AaaABbbBCcc"

Expected result:

# [1] "AaaA" "BbbB" "Ccc" 

Upvotes: 5

Views: 123

Answers (1)

Giulio Mattolin
Giulio Mattolin

Reputation: 650

It seems that by adding (?!^) you can obtained the desired result.

strsplit('AaaBbbCcc', "(?!^)(?=[A-Z])", perl=TRUE)

For the camel case we may do

strsplit('AaaABbbBCcc', '(?!^)(?=\\p{Lu}\\p{Ll})', perl=TRUE)[[1]]
strsplit('AaaABbbBCcc', '(?!^)(?=[A-Z][a-z])', perl=TRUE)[[1]]  ## or
# [1] "AaaA" "BbbB" "Ccc" 

Upvotes: 3

Related Questions