Ian
Ian

Reputation: 30813

Regex for retaining numbers in the replacement of group containing numbers

Regarding the possible dupe post: Replace only some groups with Regex

This is not a dupe as the post replaces the group with static text, what I want is to replace the group by retaining the text in the group.

I have some texts which contain pattern like:

\super 1 \nosupersub
\super 2 \nosupersub
...
\super 592 \nosupersub

I want to replace them using regex such that they become:

<sup>1</sup>
<sup>2</sup>
...
<sup>592</sup>

So, I am using the following regex (note the group (\d+)):

RegexOptions options = RegexOptions.Multiline; //as of v1.3.1.0 default is multiline
mytext = Regex.Replace(mytext, @"\s?\\super\s?(\d+)\s?\\nosupersub\s", @"<sup>\1</sup>", options);

However, instead of getting what I want, I got all the results replaced with <sup>\1</sup>:

<sup>\1</sup>
<sup>\1</sup>
...
<sup>\1</sup>

If I try the regex replacement using a text editor like https://www.sublimetext.com and also using Python, it is OK.

How to get such group replacement of (\d+) like that (retain the number) in C#?

Upvotes: 1

Views: 244

Answers (2)

TheLethalCoder
TheLethalCoder

Reputation: 6744

I haven't tested this code and wrote it from memory so this might not work but the general idea is there.

Why use regex at all?

List<string> output = new List<string>();
foreach (string line in myText.Split(new string[] { Environment.NewLine }, StringSplitOptions.None))
{
    string alteredLine = line.Replace("\super", "").Replace("\nosupersub", "").Trim();

    int n;
    if (Int32.TryParse(alteredLine, out n))
    {
        output.Add("<sup>" + n + "</sup>");
    }
    else
    {
         //Add the original input in case it failed?
         output.Add(line);
    }
}

or for a linq version:

myText = myText.Split(new string[] { Environment.NewLine }, StringSplitOptions.None)
               .Select(l => "<sup>" + l.Replace("\super", "").Replace("\nosupersub", "").Trim() + "</sup>");

Upvotes: 1

Steven Doggart
Steven Doggart

Reputation: 43743

Many regex tools use the \1 notation to refer to a group's value in the replacement pattern (same in syntax to a backreference). For whatever reason, Microsoft chose to instead use $1 for the notation in the .NET implementation of regex. Note that backreferences still use the \1 syntax in .NET. It's only the syntax in the replacement pattern which is different. See the Substitutions section of this page for more info.

Upvotes: 2

Related Questions