MedicineMan
MedicineMan

Reputation: 15314

RegEx needed to split javascript string on "|" but not "\|"

We would like to split a string on instances of the pipe character |, but not if that character is preceded by an escape character, e.g. \|.

ex we would like to see the following string split into the following components

1|2|3\|4|5

1
2
3\|4
5

I'm expecting to be able to use the following javascript function, split, which takes a regular expression. What regex would I pass to split? We are cross platform and would like to support current and previous versions (1 version back) of IE, FF, and Chrome if possible.

Upvotes: 1

Views: 1324

Answers (3)

Bart Kiers
Bart Kiers

Reputation: 170158

Instead of a split, do a global match (the same way a lexical analyzer would):

  • match anything other than \\ or |
  • or match any escaped char

Something like this:

var str = "1|2|3\\|4|5";
var matches = str.match(/([^\\|]|\\.)+/g);

A quick explanation: ([^\\|]|\\.) matches either any character except '\' and '|' (pattern: [^\\|]) or (pattern: |) it matches any escaped character (pattern: \\.). The + after it tells it to match the previous once or more: the pattern ([^\\|]|\\.) will therefor be matches once or more. The g at the end of the regex literal tells the JavaScript regex engine to match the pattern globally instead of matching it just once.

Upvotes: 9

Mamsaac
Mamsaac

Reputation: 6273

A regex solution was posted as I was looking into this. So I just went ahead and wrote one without it. I did some simple benchmarks and it is -slightly- faster (I expected it to be slower...).

Without using Regex, if I understood what you desire, this should do the job:

function doSplit(input) {
    var output = [];
    var currPos = 0,
        prevPos = -1;
    while ((currPos = input.indexOf('|', currPos + 1)) != -1) {
        if (input[currPos-1] == "\\") continue;
        var recollect = input.substr(prevPos + 1, currPos - prevPos - 1);
        prevPos = currPos;
        output.push(recollect);
    }
    var recollect = input.substr(prevPos + 1);
    output.push(recollect);
    return output;
}
doSplit('1|2|3\\|4|5'); //returns [ '1', '2', '3\\|4', '5' ]

Upvotes: 0

alc
alc

Reputation: 1557

What you're looking for is a "negative look-behind matching regular expression".

This isn't pretty, but it should split the list for you:

var output = input.replace(/(\\)?|/g, function($0,$1){ return $1?$1:$0+'\n';});

This will take your input string and replace all of the '|' characters NOT immediately preceded by a '\' character and replace them with '\n' characters.

Upvotes: 1

Related Questions