Jesper Palm
Jesper Palm

Reputation: 7238

Regex split on upper case and first digit

I need to split the string "thisIs12MyString" to an array looking like [ "this", "Is", "12", "My", "String" ]

I've got so far as to "thisIs12MyString".split(/(?=[A-Z0-9])/) but it splits on each digit and gives the array [ "this", "Is", "1", "2", "My", "String" ]

So in words I need to split the string on upper case letter and digits that does not have an another digit in front of it.

Upvotes: 4

Views: 2240

Answers (5)

Shiplu Mokaddim
Shiplu Mokaddim

Reputation: 57660

In my rhino console,

js> "thisIs12MyString".replace(/([A-Z]|\d+)/g, function(x){return " "+x;}).split(/ /);
this,Is,12,My,String

another one,

js> "thisIs12MyString".split(/(?:([A-Z]+[a-z]+))/g).filter(function(a){return  a;});
this,Is,12,My,String

Upvotes: 1

georg
georg

Reputation: 214969

Are you looking for this?

"thisIs12MyString".match(/[A-Z]?[a-z]+|[0-9]+/g)

returns

["this", "Is", "12", "My", "String"]

Upvotes: 9

Felix Kling
Felix Kling

Reputation: 816552

As I said in my comment, my approach would be to insert a special character before each sequence of digits first, as a marker:

"thisIs12MyString".replace(/\d+/g, '~$&').split(/(?=[A-Z])|~/)

where ~ could be any other character, preferably a non-printable one (e.g. a control character), as it is unlikely to appear "naturally" in a string.

In that case, you could even insert the marker before each capital letter as well, and omit the lookahead, making the split very easy:

"thisIs12MyString".replace(/\d+|[A-Z]/g, '~$&').split('~')

It might or might not perform better.

Upvotes: 3

lorenzo-s
lorenzo-s

Reputation: 17010

You can fix the JS missing of lookbehinds working on the array split using your current regex.
Quick pseudo code:

var result = [];
var digitsFlag = false;
"thisIs12MyString".split(/(?=[A-Z0-9])/).forEach(function(word) {

    if (isSingleDigit(word)) {
        if (!digitsFlag) {
            result.push(word);
        } else {
            result[result.length - 1] += word;
        }
        digitsFlag = true;
    } else {
        result.push(word);
        digitsFlag = false;
    }

});

Upvotes: 0

Olivier Refalo
Olivier Refalo

Reputation: 51455

I can't think of any ways to achieve this with a RegEx.

I think you will need to do it in code.

Please check the URL, same question different language (ruby) ->

The code is at the bottom: http://code.activestate.com/recipes/440698-split-string-on-capitalizeduppercase-char/

Upvotes: -1

Related Questions