Reputation: 2491
For example:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>title</title>
</head>
<body>
<a href="aaa.asp?id=1"> I want to get this text </a>
<div>
<h1>this is my want!!</h1>
<b>this is my want!!!</b>
</div>
</body>
</html>
and the result is:
I want to get this text
this is my want!!
this is my want!!!
Upvotes: 24
Views: 31098
Reputation: 31383
You can start with this simple function below. Disclaimer: This code is suitable for basic HTML, but will not handle all valid HTML situations and edge cases. Tags within quotes is an example. The advantage of this code is you can easy follow the execution in a debugger, and it can be easy modified to fit edge cases specific to you.
public static string RemoveTags(string html)
{
string returnStr = "";
bool insideTag = false;
for (int i = 0; i < html.Length; ++i)
{
char c = html[i];
if (c == '<')
insideTag = true;
if (!insideTag)
returnStr += c;
if (c == '>')
insideTag = false;
}
return returnStr;
}
Upvotes: 0
Reputation: 940
Use this function...
public string Strip(string text)
{
return Regex.Replace(text, @"<(.|\n)*?>", string.Empty);
}
Upvotes: 17
Reputation: 2082
If you just want to remove the html tags then use a regular expression that deletes anything between "<" and ">".
Upvotes: 0
Reputation: 1064114
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
string s = doc.DocumentNode.SelectSingleNode("//body").InnerText;
Upvotes: 31
Reputation: 187110
Why do you want to make it server side?
For that you have to make the container element runat="server"
and then take the innerText
of the element.
You can do the same in javascript without making the element runat="server"
Upvotes: 0
Reputation: 70011
I would recommend using something like HTMLTidy.
Here's a tutorial on it to get you started.
Upvotes: 1