Reputation: 2491
For example:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>title</title>
</head>
<body>
<a href="aaa.asp?id=1"> I want to get this text </a>
<div>
<h1>this is my want!!</h1>
<b>this is my want!!!</b>
</div>
</body>
</html>
and the result is:
I want to get this text
this is my want!!
this is my want!!!
Upvotes: 24
Views: 31045
Reputation: 31337
You can start with this simple function below. Disclaimer: This code is suitable for basic HTML, but will not handle all valid HTML situations and edge cases. Tags within quotes is an example. The advantage of this code is you can easy follow the execution in a debugger, and it can be easy modified to fit edge cases specific to you.
public static string RemoveTags(string html)
{
string returnStr = "";
bool insideTag = false;
for (int i = 0; i < html.Length; ++i)
{
char c = html[i];
if (c == '<')
insideTag = true;
if (!insideTag)
returnStr += c;
if (c == '>')
insideTag = false;
}
return returnStr;
}
Upvotes: 0
Reputation: 930
Use this function...
public string Strip(string text)
{
return Regex.Replace(text, @"<(.|\n)*?>", string.Empty);
}
Upvotes: 17
Reputation: 2082
If you just want to remove the html tags then use a regular expression that deletes anything between "<" and ">".
Upvotes: 0
Reputation: 1062540
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
string s = doc.DocumentNode.SelectSingleNode("//body").InnerText;
Upvotes: 31
Reputation: 187020
Why do you want to make it server side?
For that you have to make the container element runat="server"
and then take the innerText
of the element.
You can do the same in javascript without making the element runat="server"
Upvotes: 0
Reputation: 69981
I would recommend using something like HTMLTidy.
Here's a tutorial on it to get you started.
Upvotes: 1