Seabizkit
Seabizkit

Reputation: 2415

JavaScript split on char but ignoring double escaped chars

I'm trying to do something similar to this but cant get it working.

How to split a comma separated String while ignoring escaped commas?

I have tried to figure it out but cant seem to get it right.

I would like to split the string on : but not the escaped one \\:
(my escape char is a double slash)

given: dtet:du\\,eduh ei\\:di:e,j
expected outcome: ["dtet"] ["du\\,eduh ei\\:di][e,j"]

regex link: https://regex101.com/r/12j6er/1/

Upvotes: 3

Views: 2430

Answers (4)

kcpr
kcpr

Reputation: 1105

It's probably not "super fancy" solution, but possibly more time-efficient one. Escaping an escape character is also supported and it's working in browsers not supporting 'lookbehinds'.

function splitByDelimiterIfItIsNotEscaped (text, delimiter, escapeCharacter) {
    const splittedText = []
    let numberOfDelimitersBeforeOtherCharacter = 0
    let nextSplittedTextPartIndex = 0
    for (let characterIndex = 0, character = text[0]; characterIndex < text.length; characterIndex++, character = text[characterIndex]) {
        if (character === escapeCharacter) {
            numberOfDelimitersBeforeOtherCharacter++
        } else if (character === delimiter && (!numberOfDelimitersBeforeOtherCharacter || !(numberOfDelimitersBeforeOtherCharacter % 2))) {
            splittedText.push(text.substring(nextSplittedTextPartIndex, characterIndex))
            nextSplittedTextPartIndex = characterIndex + 1
        } else {
            numberOfDelimitersBeforeOtherCharacter = 0
        }
    }
    if (nextSplittedTextPartIndex <= text.length) {
        splittedText.push(text.substring(nextSplittedTextPartIndex, text.length))
    }
    return splittedText
}

function onChange () {
    console.log(splitByDelimiterIfItIsNotEscaped(inputBox.value, ':', '\\'))
}

addEventListener('change', onChange)

onChange()
After making a change unfocus the input box (use tab for example).
<input id="inputBox" value="dtet:du\,eduh ei\:di:e,j"/>

Upvotes: 0

Jay
Jay

Reputation: 3950

I could come up with two solutions. One that is based on adjusting array contents and one that uses a regex.

Solution 1:

Approach: Split on :, then shovel elements into new array and glue those back together that should not have been split.

function splitcolon(input) {
    var inparts = input.split(":");
    var outparts = [];
    var splitwaslegit = true;
    inparts.forEach(function(part) {
        if (splitwaslegit) {
            outparts.push(part);
        } else { // the split was not justified, so reverse it by gluing this part to the previous one
            outparts[outparts.length-1] += ":" + part;
        }
        // the next split was legit if this part doesn't end on \\
        splitwaslegit = (part.substring(part.length-2) !== "\\\\");
    });
    return outparts;
}

Tested in chrome:

splitcolon("dtet:du\\\\,eduh ei\\\\:di:e,j")
(3) ["dtet", "du\\,eduh ei\\:di", "e,j"]

Note:
Could of course also use a for loop or underscore's each instead of forEach

Solution 2:

Approach: If there is any char or string of which you can be 100% sure that it won't be in the input, then you can use that char/string as a temporary delimiter inserted by a regex like this:

var tmpdelim = "\x00"; // must *never* ever occur in input string

var input = "dtet:du\\\\,eduh ei\\\\:di:e,j";
input.replace(/(^.?|[^\\].|.[^\\]):/g, "$1" + tmpdelim).split(tmpdelim);

Result:

(3) ["dtet", "du\\,eduh ei\\:di", "e,j"]

Explanation of regular expression /(^.?|[^\\].|.[^\\]):/g:

/ - start of regex
( - matching group 1 start
^.? - we're at input start or any single char away from it (the escape requires 2)
| - or
[^\\]. - any char that is not a \ followed by any other char
| - or
.[^\\] - any char followed by anything other than a \
) - matching group 1 stop
: - the matching group (which can't be \\) must be followed by a :
/ - end of regex
g - regex modifier global (match all occurances, not just the first)

which we replace with $1 + tmpdelim, so with whatever was in matching group 1, followed by our special delimiter (instead of :) which we can then use for splitting.

Bonus solution

Manjo Verma's answer as one-liner:

input.split("").reverse().join("").split(/:(?!\\\\)/).reverse().map(x => x.split("").reverse().join(""));

Result:

(3) ["dtet", "du\\,eduh ei\\:di", "e,j"]

Upvotes: 2

Manoj Verma
Manoj Verma

Reputation: 151

This is a little bit lengthy approach. but works for you. JavaScript regular expressions do not support lookbehinds. But you can do it by simply reverse your original string and split a string using lookahead. And then reverse array and all strings in it and you will get your result.

function reverse(s) {
  var o = '';
  for (var i = s.length - 1; i >= 0; i--)
    o += s[i];
  return o;
}


var str = "dtet:du\\,eduh ei\\:di:e,j";
var res = reverse(str);
var result  = res.split(/:(?!\\)/g);
result  = result.reverse();
for(var i = 0; i < result.length; i++){
	result[i] = reverse(result[i]);
}

console.log(result);

Upvotes: 2

Patrick Roberts
Patrick Roberts

Reputation: 51946

See the function below named splitOnNonEscapedDelimeter(), which accepts the string to split, and the delimeter to split on, which in this case is :. The usage is within the function onChange().

Note that you must escape the delimeter you pass to splitOnNonEscapedDelimeter(), so that it is not interpreted as a special character in the regular expression.

function nonEscapedDelimeter(delimeter) {
  return new RegExp(String.raw`[^${delimeter}]*?(?:\\\\${delimeter}[^${delimeter}]*?)*(?:${delimeter}|$)`, 'g')
}

function nonEscapedDelimeterAtEnd(delimeter) {
  return new RegExp(String.raw`([^\\].|.[^\\]|^.?)${delimeter}$`)
}

function splitOnNonEscapedDelimeter(string, delimeter) {
  const reMatch = nonEscapedDelimeter(delimeter)
  const reReplace = nonEscapedDelimeterAtEnd(delimeter)

  return string.match(reMatch).slice(0, -1).map(section => {
    return section.replace(reReplace, '$1')
  })
}

function onChange() {
  console.log(splitOnNonEscapedDelimeter(i.value, ':'))
}

i.addEventListener('change', onChange)

onChange()
<textarea id=i>dtet:du\\,eduh ei\\:di:e,j</textarea>

Requirements

This solution makes use of the ES2015 features String.raw() and template literals for convenience, though these are not required. See the relevant documentation above to understand how these work and use a polyfill such as this if your target platform does not include support for these features.

Explanation

new RegExp(String.raw`[^${delimeter}]*?(?:\\\\${delimeter}[^${delimeter}]*?)*(?:${delimeter}|$)`, 'g')

The function nonEscapedDelimeter() creates a regular expression that does almost what is required, except with a few quirks that need to be corrected with some post-processing.

string.match(reMatch)

The regular expression, when used in String#match(), splits the string into sections that either end with the non-escaped delimeter, or to the end of the string. This also has the side-effect of matching a 0-width section at the end of the string, thus we need to

.slice(0, -1)

to remove that match in post-processing.

new RegExp(String.raw`([^\\].|.[^\\]|^.?)${delimeter}$`)

...

.map(section => {
  return section.replace(reReplace, '')
})

Since each section now ends with the delimeter except for the last one (which ends at the end of the string), we need to .map() the array of matches and remove the non-escaped delimeter (thus why nonEscapedDelimeterAtEnd() is so complicated), if it is there.

Upvotes: 3

Related Questions