preachers
preachers

Reputation: 381

Combining .net and powershell regex capture groups syntax

I'm having trouble combining powershell native regex capture group syntax $n with the .net one $args.groups[n].value. The HTML code is as follows:

ab
<link rel="stylesheet" type="text/css" href="stylesheet.css">
<b style="color:black;font-size:110%">ab</b><br>
<span class="WordHead"><b>ab</b></span> <span class="IPA">[ap]</span><br>
<span class="RomArticles"><span class="RomNum">Ⅰ.</span><a id="Ⅰ."></a><i><font color="darkblue">adv</font></i></span><br>
<span class="NumBracket">&nbsp;<b><font color="sienna">1)</font></b><a id="1)"></a> <i><font color="darkgreen">(weg, entfernt)</font></i> off;</span><br>
<span class="AllExamples">&nbsp;&nbsp;<b><font color="#5b4636">zur Post geht es an der Kreuzung links ~</font></b> the post office is off to the left at the crossroads;<br>
&nbsp;&nbsp;<b><font color="#5b4636">~ sein</font></b> to be out in the sticks;<br>
&nbsp;&nbsp;<b><font color="#5b4636">weit ~ sein</font></b> [<i><font color="black">o</font></i> <b>liegen</b>] to be far away;<br>
&nbsp;&nbsp;<b><font color="#5b4636">das Lokal ist mir zu weit ab</font></b> the pub is too far away;<br>
&nbsp;&nbsp;<b><font color="#5b4636">das liegt zu weit ~ vom Weg</font></b> that's too far off the beaten track</span><br>
<span class="NumBracket">&nbsp;<b><font color="sienna">2)</font></b><a id="2)"></a> <i><font color="darkgreen">(abgetrennt)</font></i> off;</span><br>
<span class="AllExamples">&nbsp;&nbsp;<b><font color="#5b4636">~ sein</font></b> <abbr title="informal" class="Icon">fam</abbr> to be broken [off];<br>
&nbsp;&nbsp;<b><font color="#5b4636">mein Knopf ist ab</font></b> I've lost a button;<br>
&nbsp;&nbsp;<b><font color="#5b4636">erst muss die alte Farbe ~</font></b> first you have to remove the old paint</span><br>
<span class="NumBracket">&nbsp;<b><font color="sienna">3)</font></b><a id="3)"></a> <i><font color="darkgreen">(in Befehlen)</font></i> off;</span><br>
<span class="AllExamples">&nbsp;&nbsp;<b><font color="#5b4636">~ ins Bett!</font></b> off to bed!;<br>
&nbsp;&nbsp;<b><font color="#5b4636">~, ihr beiden, Hände waschen!</font></b> off you two go, and wash your hands!;<br>
&nbsp;&nbsp;<b><font color="#5b4636">~ nach Hause!</font></b> off home with you!;<br>
&nbsp;&nbsp;<b><font color="#5b4636">~ in</font></b>/<b><font color="#5b4636">auf dein Zimmer!</font></b> go to your room!;<br>
&nbsp;&nbsp;<b><font color="#5b4636">~ nach oben</font></b>/<b><font color="#5b4636">unten!</font></b> up/down we/you etc. go!;<br>
&nbsp;&nbsp;<b><font color="#5b4636">~ sofort</font></b> as of now;<br>
&nbsp;&nbsp;<b><font color="#5b4636">~ und zu</font></b> [<i><font color="black">o</font></i> <span class="region"><abbr title="Northern German" class="nordd">NORDD</abbr></span> <b>an</b>] now and then</span><br>
<span class="NumBracket">&nbsp;<b><font color="sienna">4)</font></b><a id="4)"></a> <i><font color="darkgreen">(abgehend)</font></i> from;</span><br>
<span class="AllExamples">&nbsp;&nbsp;<b><font color="#5b4636">der Zug fährt ~ Köln</font></b> the train departs from Cologne;<br>
&nbsp;&nbsp;<b><font color="#5b4636">Frankfurt ~ 19 Uhr, New York an 8 Uhr</font></b> departing Frankfurt [at] 19.00, arriving New York [at] 8.00 </span><br>
<span class="RomArticles"><span class="RomNum">Ⅱ.</span><a id="Ⅱ."></a><i><font color="darkblue">präp</font> <font color="black">+dat</font></i></span><br>
<span class="NumBracket">&nbsp;<b><font color="sienna">1)</font></b><a id="1)"></a> <i><font color="darkgreen">(räumlich)</font></i> from</span><br>
<span class="NumBracket">&nbsp;<b><font color="sienna">2)</font></b><a id="2)"></a> <i><font color="darkgreen">(zeitlich)</font></i> from;</span><br>
<span class="AllExamples">&nbsp;&nbsp;<b><font color="#5b4636">~ wann ...?</font></b> from when ...?</span><br>
<span class="NumBracket">&nbsp;<b><font color="sienna">3)</font></b><a id="3)"></a> <i><font color="darkgreen">(von ... aufwärts)</font></i> from;</span><br>
<span class="AllExamples">&nbsp;&nbsp;<b><font color="#5b4636">Kinder ~ 14 Jahren</font></b> children from the age of 14 up</span><br>
<span class="NumBracket">&nbsp;<b><font color="sienna">4)</font></b><a id="4)"></a> <span class="Categories">ökon</span> ex;</span><br>
<span class="AllExamples">&nbsp;&nbsp;<b><font color="#5b4636">Preis ~ Fabrik</font></b>/<b><font color="#5b4636">Werk</font></b> price ex factory/works</span><br>
<span class="NumBracket">&nbsp;<b><font color="sienna">5)</font></b><a id="5)"></a> <span class="region"><abbr title="Swiss" class="schweiz">SCHWEIZ</abbr></span> <i><font color="darkgreen">(nach der Uhrzeit)</font></i> past;</span><br>
<span class="AllExamples">&nbsp;&nbsp;<b><font color="#5b4636">Viertel ~ 8</font></b> quarter past eight</span><br>
<span class="NumBracket">&nbsp;<b><font color="sienna">6)</font></b><a id="6)"></a> <span class="region"><abbr title="Swiss" class="schweiz">SCHWEIZ</abbr></span> <i><font color="darkgreen">(von)</font></i> on;</span><br>
<span class="AllExamples">&nbsp;&nbsp;<b><font color="#5b4636">~ Kassette</font></b> on cassette</span><br>

I want to insert roman numbers into the <a id="n)"></a> tags. This my code which is not working:

$content = [System.IO.File]::ReadAllText("C:\test.txt", [System.Text.Encoding]::UTF8)
$result  = [regex]::Replace( $content, '(?smi)<a id="([Ⅰ-Ⅹ]\.)"></a>(?:(?!<br>).)+<br>\r\n\t(?:(?!<br>).)+<a id="\d*\)"></a>(?:(?!</>|\t<span class="(?:RomArticles|NumDotArticles|Phrases)">).)+(?=</>|\t<span class="(?:RomArticles|Phrases|NumDotArticles)">)',
{$args.value -replace '(<a id=")(\d*\)"></a>)', "$1$args.groups[1].value$2"})

How can I mix these two different syntax into one piece of code to get my desire result?

Upvotes: 1

Views: 92

Answers (1)

Mathias R. Jessen
Mathias R. Jessen

Reputation: 175085

There's only one syntax - -replace internally calls Regex.Replace(), and Regex.Replace() also supports $N references.

Your problem is two-fold - first, when you use double-quotes, like so: "$1", PowerShell will attempt to expand/resolve $1 as a variable before the substitution pattern is passed to -replace.

Second problem is that PowerShell only expands variable values in double-quoted strings, not whole expressions. For that, you need to enclose the expression in a subexpression $().

So, either escape the $ with a backtick (`) and enclose the expression in $(), or use a single-quoted string:

$args.value -replace '(<a id=")(\d*\)"></a>)', "`$1$($args.groups[1].value)`$2"

... or prepare the string separately before calling -replace, perhaps with the -f string format operator:

$sub = '$1{0}$2' -f $args.Groups[1].Value
$args.Value -replace '(<a id=")(\d*\)"></a>)', "`$1$($args.groups[1].value)`$2"

If you switch to a newer version of PowerShell (6.1 or newer), you don't need to call Regex.Replace directly to take advantage of a dynamic match evaluator:

$content -replace $pattern,{
  return $_.Value -replace '(<a id=")(\d*\)"></a>)', "`$1$($match.Groups[1].Value)`$2"
}

Upvotes: 2

Related Questions