Reputation: 793
We are using JS to load JSON data that often has multiple backslashes before a newline character. Example:
{
"test": {
"title": "line 1\\\\\\\nline2"
}
}
I've tried a variety of RegEx patterns using replace. "Oddly", they seem to work if there are an even number of backslashes, but not odd.
This sample, with 2 backslashes works:
"\\n".replace(/\\(?=.{2})/g, '');
While this sample, with 3 does not:
"\\\n".replace(/\\(?=.{2})/g, '');
Here's the js in action:
console.log('Even Slashes:');
console.log("\\n".replace(/\\(?=.{2})/g, ''));
console.log('Odd Slashes:');
console.log("\\\n".replace(/\\(?=.{2})/g, ''));
Upvotes: 2
Views: 948
Reputation:
To remove all escaped escapes from a source text, it is
find: /([^\\]|^)(?:\\\\)+/g
replace \1
Upvotes: 1
Reputation: 149020
As I mentioned in my earlier comment, you are dealing with two different escape sequences here:
\n
is an escape sequence for the newline character, i.e. Unicode Character 'LINE FEED (LF)' (U+000A)\\
is an escape sequence for the backslash, i.e. Unicode Character 'REVERSE SOLIDUS' (U+005C)Although these escape sequences are two characters in source code, they actually only represent one character in memory.
Observe:
const toEscaped = s => s.toSource().match(/"(.*)"/)[0];
['\n', '\\n', '\\\n', '\\\\n', '\\\\\n']
.forEach(s => console.log(`There are ${s.length} character(s) in ${toEscaped(s)}`))
This also applies in regular expressions. The \n
actually counts as one character so the lookahead (?=.{2})
will attempt to capture the preceding \
as well, which is why you're perhaps seeing some strangeness in the way your replacement works.
However, based on reading some of your comments, it sounds like you might be dealing with incorrect encodings. For example, you may have some cases where a user enters foo\nbar
in an input field, which is interpreted as a literal \
followed by n
(i.e. "foo\\nbar"
) and now you want to interpret this as a newline character, (i.e. "foo\nbar"
). In that case, you're not actually trying to remove \
characters, you're trying to convert the character sequence \
+ n
to \n
.
The following code snippet shows how to perform the escape sequence substitutions for \\
and \n
:
const toEscaped = s => s.toSource().match(/"(.*)"/)[0];
const toHex = s => Array.from(s).map((_, i) => s.charCodeAt(i).toString(16).padStart(2, '0')).join('+');
['\n', '\\n', '\\\n', '\\\\n', '\\\\\n']
.map(s => ({ a: s, b: s.replace(/\\n/g, '\n').replace(/\\\\/g, '\\') }))
.forEach(({a, b}) => console.log(`${toEscaped(a)} --> ${toHex(b)}`))
And to both replace the "\\n"
with "\n"
and remove "\\"
characters preceding it try something like this:
const toEscaped = s => s.toSource().match(/"(.*)"/)[0];
const toHex = s => Array.from(s).map((_, i) => s.charCodeAt(i).toString(16).padStart(2, '0')).join('+');
['\n', '\\n', '\\\n', '\\\\n', '\\\\\n']
.map(s => ({ a: s, b: s.replace(/\\+[n\n]/g, '\n') }))
.forEach(({a, b}) => console.log(`${toEscaped(a)} --> ${toHex(b)}`))
Upvotes: 1
Reputation: 92284
I think you are trying to remove all backslashes that come before a newline: str.replace(/\\+\n/g, "\n")
.
Also, you may be misunderstanding how escape sequences work:
"\\"
is one back slash
"\\n"
is one back slash followed by the letter n
See the code below for an explanation and note that Stack Overflow's console output is re-encoding the string but if you check the actual dev tools, it is better and displaying the encoded characters.
const regex = /\\+\n/g;
// This is "Hello" + [two backslashes] + "nworld"
const evenSlashes = "Hello\\\\nworld";
// This is "Hello" + [two backslashes] + [newline] + "world"
const oddSlashes = "Hello\\\\\nworld";
console.log({
evenSlashes,
oddSlashes,
// Doesn't replace anything because there's no newline on this string
replacedEvenSlashes: evenSlashes.replace(regex, "\n"),
// All backslashes before new line are replaced
replacedOddSlashes: oddSlashes.replace(regex, "\n")
});
Upvotes: 1