Reputation: 35
I am working on a project wherein I have to extract data from a website & store in acess. I am able to read data from a website & store it in html doc but now I want to parse the html doc & store in access. The following are the contents of the html file,
<HTML><HEAD><TITLE>NCEDC_Search_Results</TITLE></HEAD><BODY>Your search parameters are:
<ul>
<li>start_time=2002/01/01,00:00:00
<li>end_time=2037/01/01,00:00:00
<li>minimum_magnitude=3.0
<li>maximum_magnitude=10
<li>etype=E
<li>rflag=A,F,H,I
<li>system=selected
<li>format=ncread
</ul>
<PRE>
Date Time Lat Lon Depth Mag Magt Nst Gap Clo RMS SRC Event ID
----------------------------------------------------------------------------------------------
2002/01/10 00:44:51.53 40.4415 -126.0167 25.37 3.92 Md 56 269 147 0.29 NCSN 21208454
2002/01/12 04:41:46.93 36.7690 -121.4812 7.74 3.06 Md 54 35 5 0.09 NCSN 21208721
</PRE>
</BODY></HTML>
I want the contents between <pre></pre>
tag.
The column names are as given in the above html docs.
How can I achieve this using Html Agility Pack in C#? i tried this code, bt how do i proceed further?
string txt=null;
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("E://text.html");
HtmlNode node = doc.DocumentNode;
HtmlNodeCollection pre = node.SelectNodes("//pre");
//var prenodes = doc.DocumentNode.SelectNodes("//pre");
if (pre != null)
{
}
Console.ReadKey();
}
Upvotes: 1
Views: 1525
Reputation: 3558
You are using the wrong method to load the HTML file, that's why the following SelectNodes XPath query doesn't work.
doc.LoadHtml(string html)
is expecting a string containing the full HTML document, not a path to the document file.
Try this instead:
doc.Load("E://text.html");
Upvotes: 1