Manuel Weinkauf
Manuel Weinkauf

Reputation: 350

Regular expression to embed string with backslash and curly braces in more curly braces

This is a cross-post from TeX, but it did not get any answers there. And since I assume the problem has more to do with my understanding of regular expressions (or better, lack thereof) than with LaTeX itself, StackOverflow may have been the better place to ask to begin with.

I would like to use BibTool (which was written in C, if this is of any consequence here) to enclose some strings in a bib-file in curly braces. The test bib entry looks like this:

@Article{Cite1,
author       = {Adelbert, A.},
date         = {2020},
journaltitle = {A Journal},
title        = {A title with just \textit{Test} structure and some chemistry \ce{CO2}},
number       = {2},
pages        = {1--4},
volume       = {1},
}

I have created the following BibTool resource file:

resource {biblatex}
preserve.keys = on
preserve.key.case = on
rewrite.rule = {"\\\(.*{.*}\)" "{{\1}}"}

The rewrite.rule is supposed to be the following:

  1. Find all strings within any field that start with \, like \ce{}, \textit{}, etc. This is done by the \\ at the beginning of the regular expression.
  2. When this string is found save the following in a group, denoted by \(\): A random string at the beginning, followed by {, a random string, followed by }; i.e. the string textit{Test}.
  3. Write this string back into the same position, but enclose it in a double-set of curly braces "{{\1}}".

What it manages so far:

  1. It apparently finds all commands starting with \.
  2. It saves the strings and writes them back into the file.

So far, the code returns the following

@Article{Cite1,
Author       = {Adelbert, A.},
Date         = {2020},
JournalTitle = {A Journal},
Title        = {A title with just {{textit{Test} structure and some chemistry {{ce{CO2}}}}}},
Number       = {2},
Pages        = {1--4},
Volume       = {1},
}

You see it finds the strings and puts {{ at the beginning of each string. Unfortunately, it puts }} at the end of the field, not the string, so I now have 6 curly braces at the end of the title field. The braces do match, just two of them should be after {{textit{Test} not at the very end. I tried various constructions like rewrite.rule = {"\\\(.*{.*}\)$" "{{\1}}"}, rewrite.rule = {"\\\(.*{.*}\) ?$" "{{\1}}"}, rewrite.rule = {"\\\(.*{.*}\)*$" "{{\1}}"} but this all did not work.

When trying to get the \ back at the beginning of the string, using rewrite.rule = {"\\\(.*{.*}\)" "{{\\\1}}"} I get the \ back, but also thousands of {} until I get a Rewrite limit exceeded error.

I am not very good with regular expressions and would be happy for any comments.

Upvotes: 2

Views: 369

Answers (2)

Gerd Neugebauer
Gerd Neugebauer

Reputation: 136

My approach would use two phases. In the first phase I would process the macro with one argument and replace in the result the \ by a replacement representation (here ##). In the second pahe I simply replace ## by \.

In BibTool this looks as follows:

rewrite.rule {"\\\(\([a-zA-Z]+\|.\){[^{}]*}\)" "{##\1}"}
rewrite.rule {"##" "\\"} 

Note, that in general the task depicted can not be solved with regular expressions...

Upvotes: 2

Zwiers
Zwiers

Reputation: 3658

The behavior of .* by default is to match as many characters as possible. This is called 'greedy matching' in regex terms.

Your pattern is likely matching the following on hitting the first \:

\textit{Test} structure and some chemistry \ce{CO2}}

Replacing the text to:

{{textit{Test} structure and some chemistry \ce{CO2}}}}

And then finding the next \ and replacing:

\ce{CO2}}}} becomes {{ce{CO2}}}}}}

Total effect:

{A title with just \textit{Test} structure and some chemistry \ce{CO2}}

{A title with just {{textit{Test} structure and some chemistry {{ce{CO2}}}}}}

To change the behaviour in most regex flavors you can put a ? after the quantifier: .*? to make it 'lazy', that is match the least amount of characters.

Upvotes: 2

Related Questions