Don_B
Don_B

Reputation: 243

How to escape special characters in a regex replace in c#?

I have a text file which contains string like

<disp-formula id="deqn*"><text-notation="math">\begin{equation*}
x=5 \tag{5}
y=3 \tag{6}
x+y=8 \tag {7}
\end{equation*}</text-notation="math"></disp-formula>

<disp-formula id="deqn*"><text-notation="math">\begin{equation*}
x+y=5 \tag{3}
\end{equation*}</text-notation="math"></disp-formula>

<disp-formula id="deqn*"><text-notation="math">\begin{equation*}
a+y=15 \tag {4a}
\end{equation*}</text-notation="math"></disp-formula>

<disp-formula id="deqn*"><text-notation="math">\begin{equation*}
x=5 \tag {9a}
y=3 \tag{10}
x+y=8 \tag{11}
\end{equation*}</text-notation="math"></disp-formula>
...etc

I'm trying to convert them to

<disp-formula id="deqn5-7"><text-notation="math">\begin{equation*}
x=5 \tag{5}
y=3 \tag{6}
x+y=8 \tag {7}
\end{equation*}</text-notation="math"></disp-formula>

<disp-formula id="deqn3"><text-notation="math">\begin{equation*}
x+y=5 \tag{3}
\end{equation*}</text-notation="math"></disp-formula>

<disp-formula id="deqn4a"><text-notation="math">\begin{equation*}
a+y=15 \tag {4a}
\end{equation*}</text-notation="math"></disp-formula>

<disp-formula id="deqn9a-11"><text-notation="math">\begin{equation*}
x=5 \tag {9a}
y=3 \tag{10}
x+y=8 \tag{11}
\end{equation*}</text-notation="math"></disp-formula>
...etc

using a couple of regex replace on the file. The first regex replace looks like

(?s)(<disp-formula id="deqn)[^"]*?("(?:.(?!/disp-formula))+?.\\tag\s?\{)([^}]+?)(\}(?:.(?!/disp-formula))+.\\tag\s?\{)([^}]+?)\}

which is replaced by

$1$3-$5$2$3$4$5}

and the second regex is

(?s)(<disp-formula id="deqn)[^"]*?("(?:.(?!/disp-formula|\\tag))+?.\\tag\s?\{)([^}]+?)(\}(?:.(?!/disp-formula|\\tag))+?</disp-formula>)

which will be replace by

$1$3$2$3$4

Both the regex have been tested using http://regexstorm.net/tester and it works but when I try to use it in my code it does not work.

I'm struggling to escape some characters in my regex I think, can anyone help me here is my code

string content=File.ReadAllText(@"D:\test\00057_po.txt");
string pattern1 = "(?s)(<disp-formula id=\"deqn)[^\"]*?(\"(?:.(?!/disp-formula))+?.\\tag\\s?{{)([^}]+?)(}}(?:.(?!/disp-formula))+.\\tag\\s?{{)([^}]+?)}}";
string replacement1 = "$1$3-$5$2$3$4$5}}";
string pattern2="(?s)(<disp-formula id=\"deqn)[^\"]*?(\"(?:.(?!/disp-formula|\\tag))+?.\\tag\\s?{{)([^}]+?)(}}(?:.(?!/disp-formula|\\tag))+?</disp-formula>)";
string replacement2 = "$1$3$2$3$4";
Regex rgx = new Regex(pattern1);
Regex rgx2 = new Regex(pattern2);
string result1 = rgx.Replace(content, replacement1);
string result2 = rgx2.Replace(result1, replacement2);
File.WriteAllText(@"D:\test\00057_po.txt",result2);

Upvotes: 1

Views: 106

Answers (2)

Tamal Banerjee
Tamal Banerjee

Reputation: 503

Try these

    string pattern1 = "(?s)(<disp-formula id=\"deqn)[^\"]*?(\"(?:.(?!/disp-formula))+?.\\\\tag\\s?\\{)([^\\}]+?)(\\}(?:.(?!/disp-formula))+.\\\\tag\\s?\\{)([^}]+?)(?=\\})";
    string replacement1 = "$1$3-$5$2$3$4$5";

    string pattern2="(?s)(<disp-formula id=\"deqn)[^\"]*?(\"(?:.(?!/disp-formula|\\\\tag))+?.\\\\tag\\s?\\{)([^\\}]+?)(\\}(?:.(?!/disp-formula|\\\\tag))+?</disp-formula>)";
    string replacement2 = "$1$3$2$3$4";

Upvotes: 1

user7177818
user7177818

Reputation:

You need to escape {} aswell like:

"(?+s)(<disp-formula id=\"deqn)[^\"]*?(\"(?:.(?!\\/disp-formula))+?.\\tag\\s?\\{\\{)([^\\}]+?)(\\}\\}(?:.(?!/disp-formula))+.\\tag\\s?\\{\\{)([^\\}]+?)\\}\\}";

Upvotes: 0

Related Questions