Reputation: 2415
I'm trying to do something similar to this but cant get it working.
How to split a comma separated String while ignoring escaped commas?
I have tried to figure it out but cant seem to get it right.
I would like to split the string on :
but not the escaped one \\:
(my escape char is a double slash)
given: dtet:du\\,eduh ei\\:di:e,j
expected outcome: ["dtet"] ["du\\,eduh ei\\:di][e,j"]
regex link: https://regex101.com/r/12j6er/1/
Upvotes: 3
Views: 2430
Reputation: 1105
It's probably not "super fancy" solution, but possibly more time-efficient one. Escaping an escape character is also supported and it's working in browsers not supporting 'lookbehinds'.
function splitByDelimiterIfItIsNotEscaped (text, delimiter, escapeCharacter) {
const splittedText = []
let numberOfDelimitersBeforeOtherCharacter = 0
let nextSplittedTextPartIndex = 0
for (let characterIndex = 0, character = text[0]; characterIndex < text.length; characterIndex++, character = text[characterIndex]) {
if (character === escapeCharacter) {
numberOfDelimitersBeforeOtherCharacter++
} else if (character === delimiter && (!numberOfDelimitersBeforeOtherCharacter || !(numberOfDelimitersBeforeOtherCharacter % 2))) {
splittedText.push(text.substring(nextSplittedTextPartIndex, characterIndex))
nextSplittedTextPartIndex = characterIndex + 1
} else {
numberOfDelimitersBeforeOtherCharacter = 0
}
}
if (nextSplittedTextPartIndex <= text.length) {
splittedText.push(text.substring(nextSplittedTextPartIndex, text.length))
}
return splittedText
}
function onChange () {
console.log(splitByDelimiterIfItIsNotEscaped(inputBox.value, ':', '\\'))
}
addEventListener('change', onChange)
onChange()
After making a change unfocus the input box (use tab for example).
<input id="inputBox" value="dtet:du\,eduh ei\:di:e,j"/>
Upvotes: 0
Reputation: 3950
I could come up with two solutions. One that is based on adjusting array contents and one that uses a regex.
Solution 1:
Approach: Split on :
, then shovel elements into new array and glue those back together that should not have been split.
function splitcolon(input) {
var inparts = input.split(":");
var outparts = [];
var splitwaslegit = true;
inparts.forEach(function(part) {
if (splitwaslegit) {
outparts.push(part);
} else { // the split was not justified, so reverse it by gluing this part to the previous one
outparts[outparts.length-1] += ":" + part;
}
// the next split was legit if this part doesn't end on \\
splitwaslegit = (part.substring(part.length-2) !== "\\\\");
});
return outparts;
}
Tested in chrome:
splitcolon("dtet:du\\\\,eduh ei\\\\:di:e,j")
(3) ["dtet", "du\\,eduh ei\\:di", "e,j"]
Note:
Could of course also use a for
loop or underscore's each
instead of forEach
Solution 2:
Approach: If there is any char or string of which you can be 100% sure that it won't be in the input, then you can use that char/string as a temporary delimiter inserted by a regex like this:
var tmpdelim = "\x00"; // must *never* ever occur in input string
var input = "dtet:du\\\\,eduh ei\\\\:di:e,j";
input.replace(/(^.?|[^\\].|.[^\\]):/g, "$1" + tmpdelim).split(tmpdelim);
Result:
(3) ["dtet", "du\\,eduh ei\\:di", "e,j"]
Explanation of regular expression /(^.?|[^\\].|.[^\\]):/g
:
/
- start of regex
(
- matching group 1 start
^.?
- we're at input start or any single char away from it (the escape requires 2)
|
- or
[^\\].
- any char that is not a \
followed by any other char
|
- or
.[^\\]
- any char followed by anything other than a \
)
- matching group 1 stop
:
- the matching group (which can't be \\
) must be followed by a :
/
- end of regex
g
- regex modifier global (match all occurances, not just the first)
which we replace with $1 + tmpdelim
, so with whatever was in matching group 1, followed by our special delimiter (instead of :
) which we can then use for splitting.
Bonus solution
Manjo Verma's answer as one-liner:
input.split("").reverse().join("").split(/:(?!\\\\)/).reverse().map(x => x.split("").reverse().join(""));
Result:
(3) ["dtet", "du\\,eduh ei\\:di", "e,j"]
Upvotes: 2
Reputation: 151
This is a little bit lengthy approach. but works for you. JavaScript regular expressions do not support lookbehinds. But you can do it by simply reverse your original string and split a string using lookahead. And then reverse array and all strings in it and you will get your result.
function reverse(s) {
var o = '';
for (var i = s.length - 1; i >= 0; i--)
o += s[i];
return o;
}
var str = "dtet:du\\,eduh ei\\:di:e,j";
var res = reverse(str);
var result = res.split(/:(?!\\)/g);
result = result.reverse();
for(var i = 0; i < result.length; i++){
result[i] = reverse(result[i]);
}
console.log(result);
Upvotes: 2
Reputation: 51946
See the function below named splitOnNonEscapedDelimeter()
, which accepts the string
to split, and the delimeter
to split on, which in this case is :
. The usage is within the function onChange()
.
Note that you must escape the
delimeter
you pass tosplitOnNonEscapedDelimeter()
, so that it is not interpreted as a special character in the regular expression.
function nonEscapedDelimeter(delimeter) {
return new RegExp(String.raw`[^${delimeter}]*?(?:\\\\${delimeter}[^${delimeter}]*?)*(?:${delimeter}|$)`, 'g')
}
function nonEscapedDelimeterAtEnd(delimeter) {
return new RegExp(String.raw`([^\\].|.[^\\]|^.?)${delimeter}$`)
}
function splitOnNonEscapedDelimeter(string, delimeter) {
const reMatch = nonEscapedDelimeter(delimeter)
const reReplace = nonEscapedDelimeterAtEnd(delimeter)
return string.match(reMatch).slice(0, -1).map(section => {
return section.replace(reReplace, '$1')
})
}
function onChange() {
console.log(splitOnNonEscapedDelimeter(i.value, ':'))
}
i.addEventListener('change', onChange)
onChange()
<textarea id=i>dtet:du\\,eduh ei\\:di:e,j</textarea>
This solution makes use of the ES2015 features String.raw()
and template literals for convenience, though these are not required. See the relevant documentation above to understand how these work and use a polyfill such as this if your target platform does not include support for these features.
new RegExp(String.raw`[^${delimeter}]*?(?:\\\\${delimeter}[^${delimeter}]*?)*(?:${delimeter}|$)`, 'g')
The function nonEscapedDelimeter()
creates a regular expression that does almost what is required, except with a few quirks that need to be corrected with some post-processing.
string.match(reMatch)
The regular expression, when used in String#match()
, splits the string into sections that either end with the non-escaped delimeter
, or to the end of the string. This also has the side-effect of matching a 0-width section at the end of the string, thus we need to
.slice(0, -1)
to remove that match in post-processing.
new RegExp(String.raw`([^\\].|.[^\\]|^.?)${delimeter}$`)
...
.map(section => {
return section.replace(reReplace, '')
})
Since each section now ends with the delimeter
except for the last one (which ends at the end of the string), we need to .map()
the array of matches and remove the non-escaped delimeter
(thus why nonEscapedDelimeterAtEnd()
is so complicated), if it is there.
Upvotes: 3