Reputation: 4811
The HTTP GET response for a request is like below
<html>
<head> <script type="text/javascript">----</script> <script type="text/javascript">---</script> <title>Detailed Notes</title>
</head>
<body style="background-color: #FFFFFF; border-width: 0px; font-family: sans-serif; font-size: 13; color: #000000"> <p>this is one note </p> </body> </html>
I am getting this as a string and i have to read the body part out of it.
I tried HtmlAgility pack, but HTML parsing is getting failed due to some specials in the html content (I think something from the commented script causing this issue).
So to read the tag content i am thinking of a SubString operation.
Like SubString from the beginning of <body tag
.
How can we do SubString from the beginning of a word from a text?
Upvotes: 2
Views: 182
Reputation: 32248
Using a simple SubString()
with IndexOf
() + LastIndexOf()
:
string BodyContent = input.Substring(0, input.LastIndexOf("</body>") - 1).Substring(input.IndexOf("<body"));
BodyContent = BodyContent.Substring(BodyContent.IndexOf(">") + 1).Trim();
This will return:
<p> this is one note </p>
string FullBody = input.Substring(0, input.LastIndexOf("</body>") + 7).Substring(input.IndexOf("<body")).Trim();
This will return:
<body style = background-color: #FFFFFF; border-width: 0px; font-family: sans-serif; font-size: 13; color: #000000' >< p > this is one note </p> </body>
Upvotes: 2
Reputation: 1710
The " will cause a problme so you need to replace every " after you get the request source
WebClient client = new WebClient(); // make an instance of webclient
string source = client.DownloadString("url").Replace("\"",",,"); // get the html source and escape " with any charachter
string code = "<body style=\"background-color: #FFFFFF; border-width: 0px; font-family: sans-serif; font-size: 13; color: #000000\"> <p>this is one note </p> </body>";
MatchCollection m0 = Regex.Matches(code, "(<body)(?<body>.*?)(</body>)", RegexOptions.Singleline); // use RE to get between tags
foreach (Match m in m0) // loop through the results
{
string result = m.Groups["body"].Value.Replace(",,", "\""); // get the result and replace the " back
}
Upvotes: 1