Reputation: 190
I'm trying to replace double quotes with curly quotes, except when the text is wrapped in certain tags, like [quote] and [code].
Sample input
[quote="Name"][b]Alice[/b] said, "Hello world!"[/quote]
<p>"Why no goodbye?" replied [b]Bob[/b]. "It's always Hello!"</p>
Expected output
[quote="Name"][b]Alice[/b] said, "Hello world!"[/quote]
<p>“Why no goodbye?” replied [b]Bob[/b]. “It's always Hello!”</p>
I figured how to elegantly achieve what I want in PHP by using (*SKIP)(*F)
, however my code will be run in javascript, and the javascript solution is less than ideal.
Right now I'm splitting the string at those tags, running the replace, then putting the string together:
var o = 3;
a = a
.split(/(\[(?<first>(?:icode|quote|code))[^\]]*?\](?:[\s]*?.)*?[\s]*?\[\/(?:\k<first>)\])/i)
.map(function(x,i) {
if (i == o-1 && x) {
x = '';
}
else if (i == o && x)
{
x = x.replace(/(?![^<]*>|[^\[]*\])"([^"]*?)"/gi, '“$1”')
o = o+3;
}
return x;
}).join('');
Javascript Regex Breakdown
split()
:
(\[(?<first>icode|quote|code)[^\]]*?\](?:.)*?\[\/(\k<first>)\])
- captures the pattern inside parentheses:
\[(?<first>quote|code|icode)[^\]]*?\]
- a [quote]
, [code]
, or [icode]
opening tag, with or without parameters like =html
, eg [code=html]
(?:[\s]*?.)*?
- any 0+ (as few as possible) occurrences of any char (.
), preceded or not by whitespace, so it doesn't break if the opening tag is followed by a line break[\s]*?
- 0+ whitespaces\[\/(\k<first>)\]
- [\quote]
, [\code]
, or [\icode]
closing tags. Matches the text captured in the (?<first>)
group. Eg: if it's a quote opening tag, it'll be a quote closing tagreplace()
:
(?![^<]*>|[^\[]*\])"([^"]*?)"
- captures text inside double quotes:
(?![^<]*>|[^\[]*\])
- negative lookahead, looks for characters (that aren't <
or [
) followed by either >
or ]
and discards them, so it won't match anything inside bbcode and html tags. Eg: [spoiler="Name"]
or <span style="color: #24c4f9">
. Note that matches wrapped in tags are left untouched."
- literal opening double quotes character.([^"]*?)
- any 0+ character, except double quotes."
- literal closing double quotes character.SPLIT() REGEX DEMO: https://regex101.com/r/Ugy3GG/1
That's awful, because the replace is executed multiple times.
Meanwhile, the same result can be achieved with a single PHP regex. The regex I wrote was based on Match regex pattern that isn't within a bbcode tag.
(\[(?<first>quote|code|icode)[^\]]*?\](?:[\s]*?.)*?[\s]*?\[\/(\k<first>)\])(*SKIP)(*F)|(?![^<]*>|[^\[]*\])"([^"]*?)"
PHP Regex Breakdown
(\[(?<first>quote|code|icode)[^\]]*?\](?:[\s]*?.)*?[\s]*?\[\/(\k<first>)\])(*SKIP)(*F)
- matches the pattern inside capturing parentheses just like javascript split()
above, then (*SKIP)(*F)
make the regex engine omit the matched text.|
- or(?![^<]*>|[^\[]*\])"([^"]*?)"
- captures text inside double quotes in the same way javascript replace()
doesPHP DEMO: https://regex101.com/r/fB0lyI/1
The beauty of this regex is that it only needs to be run once. No splitting and joining of strings. Is there a way to implement it in javascript?
Upvotes: 1
Views: 619
Reputation: 48721
Because JS lacks backtracking verbs you will need to consume those bracketed chunks but later replace them as is. By obtaining the second side of the alternation from your own regex the final regex would be:
\[(quote|i?code)[^\]]*\][\s\S]*?\[\/\1\]|(?![^<]*>|[^\[]*\])"([^"]*)"
But the tricky part is using a callback function with replace()
method:
str.replace(regex, function($0, $1, $2) {
return $1 ? $0 : '“' + $2 + '”';
})
Above ternary operator returns $0
(whole match) if first capturing group exists otherwise it encloses second capturing group value in curly quotes and returns it.
Note: this may fail in different cases.
See live demo here
Upvotes: 2
Reputation: 16726
Nested markup is hard to parse with rx, and JS's RegExp in particular. Complex regular expressions also hard to read, maintain, and debug. If your needs are simple, a tag content replacement with some banned tags excluded, consider a simple code-based alternative to run-on RegExps:
function curly(str) {
var excludes = {
quote: 1,
code: 1,
icode: 1
},
xpath = [];
return str.split(/(\[[^\]]+\])/) // breakup by tag markup
.map(x => { // for each tag and content:
if (x[0] === "[") { // tag markup:
if (x[1] === "/") { // close tag
xpath.pop(); // remove from current path
} else { // open tag
xpath.push(x.slice(1).split(/\W/)[0]); // add to current path
} //end if open/close tag
} else { // tag content
if (xpath.every(tag =>!excludes[tag])) x = x.replace(/"/g, function repr() {
return (repr.z = !repr.z) ? "“" : "”"; // flip flop return value (naive)
});
} //end if markup or content?
return x;
}) // end term map
.join("");
} /* end curly() */
var input = `[quote="Name"][b]Alice[/b] said, "Hello world!"[/quote]
<p>"Why no goodbye?" replied [b]Bob[/b]. "It's always Hello!"</p>`;
var wants = `[quote="Name"][b]Alice[/b] said, "Hello world!"[/quote]
<p>“Why no goodbye?” replied [b]Bob[/b]. “It's always Hello!”</p>`;
curly(input) == wants; // true
To my eyes, even though it a bit longer, code allows documentation, indentation, and explicit naming that makes these sort of semi-complicated logical operations easier to understand.
If your needs are more complex, use a true BBCode parser for JavaScript and map/filter/reduce it's model as needed.
Upvotes: 1