Parsing HTML Document using Html Agility Pack

Question

I am working on a project wherein I have to extract data from a website & store in acess. I am able to read data from a website & store it in html doc but now I want to parse the html doc & store in access. The following are the contents of the html file,

NCEDC_Search_ResultsYour search parameters are:


start_time=2002/01/01,00:00:00
end_time=2037/01/01,00:00:00
minimum_magnitude=3.0
maximum_magnitude=10
etype=E
rflag=A,F,H,I
system=selected
format=ncread

Date       Time             Lat       Lon  Depth   Mag Magt  Nst Gap  Clo  RMS  SRC   Event ID
----------------------------------------------------------------------------------------------
2002/01/10 00:44:51.53  40.4415 -126.0167  25.37  3.92   Md   56 269  147 0.29 NCSN   21208454 
2002/01/12 04:41:46.93  36.7690 -121.4812   7.74  3.06   Md   54  35    5 0.09 NCSN   21208721

I want the contents between

tag. The column names are as given in the above html docs.

How can I achieve this using Html Agility Pack in C#? i tried this code, bt how do i proceed further?

string txt=null;
        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml("E://text.html");
        HtmlNode node = doc.DocumentNode;
        HtmlNodeCollection pre = node.SelectNodes("//pre"); 
        //var prenodes = doc.DocumentNode.SelectNodes("//pre");
        if (pre != null)
        {




        }

        Console.ReadKey();


    }

Fung · Accepted Answer

You are using the wrong method to load the HTML file, that's why the following SelectNodes XPath query doesn't work.

doc.LoadHtml(string html) is expecting a string containing the full HTML document, not a path to the document file.

Try this instead:

doc.Load("E://text.html");

Parsing HTML Document using Html Agility Pack

Answers (1)

Related Questions