Teamol
Teamol

Reputation: 819

Regex in .net seems to not work correctly

I want strip html from string with regular expression and while this regex works everywhere it does not work in .net I don't understand why.

using System;
                        
public class Program
{
    public static void Main()
    {
        var text = "FOO <span style=\"mso-bidi-font-size:11.0pt;\nmso-fareast-language:EN-US\"> BAR";
        var res = System.Text.RegularExpressions.Regex.Replace(text, "<.*?>", "");
        Console.WriteLine(res);
    }
}

Upvotes: 1

Views: 91

Answers (2)

ProgrammingLlama
ProgrammingLlama

Reputation: 38767

You're missing the correct Regex option:

var res = System.Text.RegularExpressions.Regex.Replace(text, "<.*?>", "", RegexOptions.Singleline);

The reason you need this is because you have a newline (\n) in your HTML. Singleline will ensure that . even matches newline characters.

Docs blurb:

Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n). For more information, see the "Single-line Mode" section in the Regular Expression Options article.

Docs

Try it online

Upvotes: 5

Antonio Skopin
Antonio Skopin

Reputation: 31

Try this:

System.Text.RegularExpressions.Regex.Replace(text, "<[^>]*>", "");

This will strip the html of your string.

Upvotes: 0

Related Questions