Alex Gordon
Alex Gordon

Reputation: 60731

c# rendering html into text

i want to be able to take html code and render plain text out of it.

another words this would be my input

<h3>some text</h3>

i want the result to look like this:

some text

how would i do it?

Upvotes: 0

Views: 455

Answers (4)

Pratik Deoghare
Pratik Deoghare

Reputation: 37172

Poor Man's HTML Parser

        string s =
            @"
            <html>
            <body>
            <h1>My First Heading</h1>
            <p>My first paragraph.</p>
            </body>
            </html> 
        ";

        foreach (var item in s.Split(new char[]{'<'}))
        {
            int x = item.IndexOf('>');

            if (x != -1)
            {
                Console.WriteLine(item.Substring(x).Trim('>'));
            }
        }

Upvotes: 0

James
James

Reputation: 82096

You would need to use some form of HTML parser. You could use an existing Regex or build your own. However, they aren't always 100% reliable. I would suggest using a 3rd party utility like HtmlAgilityPack (I have used this one and would recommend it)

Upvotes: 0

sashaeve
sashaeve

Reputation: 9607

Use regex.

String result = Regex.Replace(your_text_goes_here, @"<[^>]*>", String.Empty);

Upvotes: 1

Justin Niessner
Justin Niessner

Reputation: 245419

I would suggest trying the HTML Agility Pack for .NET:

Html Agility Pack - Codeplex

Attemtping to parse through HTML with anything else is, for the most part, unreliable.

Whatever you do, DON'T TRY TO PARSE HTML WITH REGEX!

Upvotes: 3

Related Questions