Rupert
Rupert

Reputation: 75

How to read a line of HTML using C#

I know how to read a line in a txt file but for some reason C# is not detecting the end of line on HTML files. This code basically opens the html file and tries to parse line by line in search of the specified string. Even when just trying to print the first line of text in the HTML file nothign is displayed.

using (StreamReader sr = new StreamReader("\\\\server\\myFile.html"))
        {
            String line;
            while ((line = sr.ReadLine()) != null)
            {
                if(line == ("<td><strong>String I wantstrong></td>"))
                {
                    Label1.Text = "Text Found";
                    break;
                }
            }
        }

I have tried this using a plain txt file and it works perfectly, just not when trying to parse an HTML file.

Thanks.

Upvotes: 4

Views: 7716

Answers (4)

Alan
Alan

Reputation: 46893

Your outer loop that reads line works fine. My guess is one of the following is taken place:

  • The HTML file is empty
  • The first line in the HTML file is empty

In either case, you won't see anything printed.

Now, to your loop:

You likely don't see what you expect, because

 if(line == ("<td><strong>String I wantstrong></td>"))
 {
    Label1.Text = "Text Found";
    break;
 }

Looks for an EXACT match. If this is your actual code, you're missing the open bracket </ on </strong> and you're likely forgetting that there is white space (indentation) in your HTML content.

Upvotes: 0

angularconsulting.au
angularconsulting.au

Reputation: 28319

You don't need to invent the wheel. Much better way to parse HTML is to use HTML parsers:

http://htmlagilitypack.codeplex.com/ or http://www.justagile.com/linq-to-html.aspx

Also similar question is here What is the best way to parse html in C#?

Hope it helps.

Upvotes: 3

Gaven
Gaven

Reputation: 371

The best way by far is the use the HTML Agility Pack

More about this can be found on a previous Stack overflow Question

Looking for C# HTML parser

Upvotes: 4

KK99
KK99

Reputation: 1989

If you know this HTML you are parsing is of XHTML why not parse this HTML as XML using System.XML ?

Upvotes: 0

Related Questions