Reputation: 142
I try to simplify the nested curly braces in LaTeX.
In other words: { { ... } }
| { { { ... } } }
| etc. → { ... }
For example, there is a section of TeX as follows:
$$\begin{aligned}
\pi &= \frac{1}{2}\sum\limits_{k = 0}^\infty {\frac{1}{{{{16}^k}}}} \left( {\frac{8}{{8k + 2}} + \frac{4}{{8k + 3}} + \frac{4}{{8k + 4}} - \frac{1}{{8k + 7}}} \right)\\
\zeta (2) &= \frac{{{\pi ^2}}}{6} = \frac{3}{{16}}\sum\limits_{k = 0}^\infty {\frac{1}{{{{64}^k}}}} \left( {\frac{{16}}{{{{(6k + 1)}^2}}} - \frac{{24}}{{{{(6k + 2)}^2}}} - \frac{8}{{{{(6k + 3)}^2}}} - \frac{6}{{{{(6k + 4)}^2}}} + \frac{1}{{{{(6k + 5)}^2}}}} \right)\\
\zeta (3) &= \frac{9}{{224}}\sum\limits_{k = 0}^\infty {\frac{1}{{{{4096}^k}}}}
\left(\begin{aligned}
&\frac{{1024}}{{{{(24k + 2)}^3}}} - \frac{{3072}}{{{{(24k + 3)}^3}}} + \frac{{512}}{{{{(24k + 4)}^3}}} + \frac{{1024}}{{{{(24k + 6)}^3}}} + \frac{{1152}}{{{{(24k + 8)}^3}}}\\
+& \frac{{384}}{{{{(24k + 9)}^3}}} + \frac{{64}}{{{{(24k + 10)}^3}}} + \frac{{128}}{{{{(24k + 12)}^3}}} + \frac{{16}}{{{{(24k + 14)}^3}}} + \frac{{48}}{{{{(24k + 15)}^3}}} + \frac{{72}}{{{{(24k + 16)}^3}}}\\
+& \frac{{16}}{{{{(24k + 18)}^3}}} + \frac{2}{{{{(24k + 20)}^3}}} - \frac{6}{{{{(24k + 21)}^3}}} + \frac{1}{{{{(24k + 22)}^3}}}\\
\end{aligned} \right)\\
\end{aligned}$$
The curly braces can be nested, and there may be spaces between the curly braces, but they must appear in pairs.
I tried the regular expression { *{((?>[^{}]+|{{[^}]*}})*)} *}
, but it cannot match all the cases.
How can I improve my regex, or this can't be done by regex, I must write a simple parser?
Upvotes: 3
Views: 550
Reputation: 18970
Doing this with Regex in JavaScript is a mess.
First, since LaTeX is a structured file format a specialized parser would be more appropriate.
siefkenj/latex-parser pretty printer plugin (based on Michael Brade's LaTeX.js) could be the silver bullet answer to your problem. (However, there are some problems parsing TeX in JS as well.. but that's a different story.)
Second, since JS' Regex engine does not support recursive patterns things get even more difficult. We could use a pattern like this (multiple times if needed) to get rid of excess braces:
\{[ ]*(\{(?:[^{}]+|(?1))*\})[ ]*\}
Unfortunately, syntax like (?1)
recursing the first subpattern or (?R)
recursing the entire pattern cannot be used in JS. Nonetheless, as demonstrated by Steven Levithan
Still, given that there is a known maximum amount of recursion that needs to be accounted for, it's quite possible. Here's the solution offered, which works just fine with JavaScript (it doesn't use any advanced regex features, actually):
@[^{]+{(?:[^{}]|{[^{}]*})*}
However, this works only if:
- braces are always balanced, and ...
Adapting this to the problem at hand is not particularly hard, let's say for two levels of recursion
{[ ]*({(?:{(?:{.*?}|.)*?}|.)*?})[ ]*}
but there is a catch: Since we do not have an actual recursive pattern we need to 'recurse' the .replace
of braces manually using a loop or helper function (like in the demo code below)
const str = `\$\$\\begin{aligned}
\\pi &= \\frac{1}{2}\\sum\\limits_{k = 0}^\\infty {\\frac{1}{{{{16}^k}}}} \\left( {\\frac{8}{{8k + 2}} + \\frac{4}{{8k + 3}} + \\frac{4}{{8k + 4}} - \\frac{1}{{8k + 7}}} \\right)\\\\
\\zeta (2) &= \\frac{{{\\pi ^2}}}{6} = \\frac{3}{{16}}\\sum\\limits_{k = 0}^\\infty {\\frac{1}{{{{64}^k}}}} \\left( {\\frac{{16}}{{{{(6k + 1)}^2}}} - \\frac{{24}}{{{{(6k + 2)}^2}}} - \\frac{8}{{{{(6k + 3)}^2}}} - \\frac{6}{{{{(6k + 4)}^2}}} + \\frac{1}{{{{(6k + 5)}^2}}}} \\right)\\\\
\\zeta (3) &= \\frac{9}{{224}}\\sum\\limits_{k = 0}^\\infty {\\frac{1}{{{{4096}^k}}}}
\\left(\\begin{aligned}
&\\frac{{1024}}{{{{(24k + 2)}^3}}} - \\frac{{3072}}{{{{(24k + 3)}^3}}} + \\frac{{512}}{{{{(24k + 4)}^3}}} + \\frac{{1024}}{{{{(24k + 6)}^3}}} + \\frac{{1152}}{{{{(24k + 8)}^3}}}\\\\
+& \\frac{{384}}{{{{(24k + 9)}^3}}} + \\frac{{64}}{{{{(24k + 10)}^3}}} + \\frac{{128}}{{{{(24k + 12)}^3}}} + \\frac{{16}}{{{{(24k + 14)}^3}}} + \\frac{{48}}{{{{(24k + 15)}^3}}} + \\frac{{72}}{{{{(24k + 16)}^3}}}\\\\
+& \\frac{{16}}{{{{(24k + 18)}^3}}} + \\frac{2}{{{{(24k + 20)}^3}}} - \\frac{6}{{{{(24k + 21)}^3}}} + \\frac{1}{{{{(24k + 22)}^3}}}\\\\
\\end{aligned} \\right)\\\\
\\end{aligned}\$\$`;
const subst = `$1`;
const regex = /{[ ]*({(?:{(?:{.*?}|.)*?}|.)*?})[ ]*}/gm;
String.prototype.replacerec = function (pattern, what) {
var newstr = this.replace(pattern, what);
if (newstr == this)
return newstr;
return newstr.replace(pattern, what);
};
console.log(
str.replacerec(regex, subst)
);
The combination of the pattern and the recursive function allows (theoretically) to clean any level of nested braces. If you are working with large files you might need to use an iterative approach or defer to a parser as suggested.
When I render the modified LaTeX code,e.g. here, I get the same (visual) output.
Upvotes: 2
Reputation: 89565
A method independant of the nesting level, that uses placeholders.
let latex = `$$\\begin{aligned}
\\pi &= \\frac{1}{2}\\sum\\limits_{k = 0}^\\infty {\\frac{1}{{{{16}^k}}}} \\left( {\\frac{8}{{8k + 2}} + \\frac{4}{{8k + 3}} + \\frac{4}{{8k + 4}} - \\frac{1}{{8k + 7}}} \\right)\\\\
\\zeta (2) &= \\frac{{{\\pi ^2}}}{6} = \\frac{3}{{16}}\\sum\\limits_{k = 0}^\\infty {\\frac{1}{{{{64}^k}}}} \\left( {\\frac{{16}}{{{{(6k + 1)}^2}}} - \\frac{{24}}{{{{(6k + 2)}^2}}} - \\frac{8}{{{{(6k + 3)}^2}}} - \\frac{6}{{{{(6k + 4)}^2}}} + \\frac{1}{{{{(6k + 5)}^2}}}} \\right)\\\\
\\zeta (3) &= \\frac{9}{{224}}\\sum\\limits_{k = 0}^\\infty {\\frac{1}{{{{4096}^k}}}}
\\left(\\begin{aligned}
&\\frac{{1024}}{{{{(24k + 2)}^3}}} - \\frac{{3072}}{{{{(24k + 3)}^3}}} + \\frac{{512}}{{{{(24k + 4)}^3}}} + \\frac{{1024}}{{{{(24k + 6)}^3}}} + \\frac{{1152}}{{{{(24k + 8)}^3}}}\\\\
+& \\frac{{384}}{{{{(24k + 9)}^3}}} + \\frac{{64}}{{{{(24k + 10)}^3}}} + \\frac{{128}}{{{{(24k + 12)}^3}}} + \\frac{{16}}{{{{(24k + 14)}^3}}} + \\frac{{48}}{{{{(24k + 15)}^3}}} + \\frac{{72}}{{{{(24k + 16)}^3}}}\\\\
+& \\frac{{16}}{{{{(24k + 18)}^3}}} + \\frac{2}{{{{(24k + 20)}^3}}} - \\frac{6}{{{{(24k + 21)}^3}}} + \\frac{1}{{{{(24k + 22)}^3}}}\\\\
\\end{aligned} \\right)\\\\
\\end{aligned}$$`;
// We protect each eventual literal tilde in the string
latex = latex.replace(/~/g, '~~');
// We replace each innermost substring enclosed between curly brackets by
// a key. The key is used to store the matched substring in a Map.
let substrings = new Map(),
repNo = 1; // No of replacements initialized with a fake value to enter
// the loop.
// The replacement pattern checks if the substring isn't a key enclosed
// between curly brackets: in this case brackets are removed since the key
// is a placeholder for a substring already enclosed between curly brackets.
// This replacement is repeated until there's nothing to replace.
for (let level = 0; repNo > 0; level++) {
repNo = 0;
latex = latex.replace(/{(?:(~l\d+n\d+~)|[^{}]*)}/g, (m, g1) => {
// it's a key -> remove the brackets
if (g1)
return g1;
// otherwise, replace with a placeholder (the key) and
// store the substring in the Map.
let key = `~l${level}n${repNo++}~`;
substrings.set(key, m);
return key;
});
}
// Now, we can replace each placeholder key with its value.
do {
repNo = 0;
latex = latex.replace(/~l\d+n\d+~/g, (m) => {
repNo++;
return substrings.get(m);
});
} while (repNo);
// restore the eventual escaped tildes
latex = latex.replace(/~~/g, '~');
console.log(latex);
Note that the format of the placeholder is totally arbitrary, feel free to build your own. For instance, there's no need to include the level inside it, but with it you can take a look at the string between the two loops and see the maximum nesting level.
Upvotes: 2
Reputation: 498
Regular expressions are the wrong tool for this job. Regular expressions can determine a regular language but they can't determine a context-free language like this one (parenthesis matching is one of the most common examples of context-free languages). All in all, you need to write a stack-based parser (pushdown automaton).
Upvotes: 1