guaike
guaike

Reputation: 2491

How to clean HTML tags using C#

For example:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>title</title>
</head>
<body>
    <a href="aaa.asp?id=1"> I want to get this text </a>
    <div>
        <h1>this is my want!!</h1>
        <b>this is my want!!!</b>
    </div>
</body>
</html>

and the result is:

 I want to get this text 
this is my want!!
this is my want!!!

Upvotes: 24

Views: 31045

Answers (6)

James Lawruk
James Lawruk

Reputation: 31337

You can start with this simple function below. Disclaimer: This code is suitable for basic HTML, but will not handle all valid HTML situations and edge cases. Tags within quotes is an example. The advantage of this code is you can easy follow the execution in a debugger, and it can be easy modified to fit edge cases specific to you.

public static string RemoveTags(string html)
    {
        string returnStr = "";
        bool insideTag = false;
        for (int i = 0; i < html.Length; ++i)
        {
            char c = html[i];
            if (c == '<')    
                insideTag = true;
            if (!insideTag)
                returnStr += c;
            if (c == '>')         
                insideTag = false;
        }
        return returnStr;        
    }

Upvotes: 0

diegodsp
diegodsp

Reputation: 930

Use this function...

public string Strip(string text)
{
    return Regex.Replace(text, @"<(.|\n)*?>", string.Empty);
}

Upvotes: 17

Andrew Marsh
Andrew Marsh

Reputation: 2082

If you just want to remove the html tags then use a regular expression that deletes anything between "<" and ">".

Upvotes: 0

Marc Gravell
Marc Gravell

Reputation: 1062540

HTML Agility Pack:

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(html);
    string s = doc.DocumentNode.SelectSingleNode("//body").InnerText;

Upvotes: 31

rahul
rahul

Reputation: 187020

Why do you want to make it server side?

For that you have to make the container element runat="server" and then take the innerText of the element.

You can do the same in javascript without making the element runat="server"

Upvotes: 0

&#211;lafur Waage
&#211;lafur Waage

Reputation: 69981

I would recommend using something like HTMLTidy.

Here's a tutorial on it to get you started.

Upvotes: 1

Related Questions