Reputation: 75
I know how to read a line in a txt file but for some reason C# is not detecting the end of line on HTML files. This code basically opens the html file and tries to parse line by line in search of the specified string. Even when just trying to print the first line of text in the HTML file nothign is displayed.
using (StreamReader sr = new StreamReader("\\\\server\\myFile.html"))
{
String line;
while ((line = sr.ReadLine()) != null)
{
if(line == ("<td><strong>String I wantstrong></td>"))
{
Label1.Text = "Text Found";
break;
}
}
}
I have tried this using a plain txt file and it works perfectly, just not when trying to parse an HTML file.
Thanks.
Upvotes: 4
Views: 7716
Reputation: 46893
Your outer loop that reads line works fine. My guess is one of the following is taken place:
In either case, you won't see anything printed.
Now, to your loop:
You likely don't see what you expect, because
if(line == ("<td><strong>String I wantstrong></td>"))
{
Label1.Text = "Text Found";
break;
}
Looks for an EXACT match. If this is your actual code, you're missing the open bracket </
on </strong>
and you're likely forgetting that there is white space (indentation) in your HTML content.
Upvotes: 0
Reputation: 28319
You don't need to invent the wheel. Much better way to parse HTML is to use HTML parsers:
http://htmlagilitypack.codeplex.com/ or http://www.justagile.com/linq-to-html.aspx
Also similar question is here What is the best way to parse html in C#?
Hope it helps.
Upvotes: 3
Reputation: 371
The best way by far is the use the HTML Agility Pack
More about this can be found on a previous Stack overflow Question
Upvotes: 4
Reputation: 1989
If you know this HTML you are parsing is of XHTML why not parse this HTML as XML using System.XML ?
Upvotes: 0