Reputation: 921
I need to parse/extract information from an html page. Basically what I'm doing is loading the page as a string using System.Net.WebClient and using HTML Agility Pack to get content inside html tags (forms, labels, imputs and so on).
However, some content is inside a javascript script tag, like this:
<script type="text/javascript">
//<![CDATA[
var itemCol = new Array();
itemCol[0] = {
pid: "01010101",
Desc: "Some desc",
avail: "Available",
price: "$10.00"
};
itemCol[1] = {
pid: "01010101",
Desc: "Some desc",
avail: "Available",
price: "$10.00"
};
//]]>
</script>
So, how could I parse it to a collection in .NET? Can HTML Agility Pack help with that? I really appreciate any help.
Thanks in advance.
Upvotes: 0
Views: 551
Reputation: 499302
The HAP will not parse out the javascript for you - the best it will do is parse out the contents of the element.
javascript.net may fit the bill.
Upvotes: 1
Reputation: 408
using the javascript.net library you can get a collection
using (JavascriptContext context = new JavascriptContext())
{
context.SetParameter("data", new MyObject());
StringBuilder s = new StringBuilder();
foreach (XPathNavigator nav in scriptTags)
{
s.Append(nav.InnerXml);
}
s.Append(";data.item = itemCol;");
context.Run(s.ToString());
MyObject o = context.GetParameter("data") as MyObject;
Then just have a datastructure like
class MyObject
{
public object item { get; set; }
}
Upvotes: 1
Reputation: 408
what part of the content inside the script tag do you want? What kind of collection are you expecting. You can always select script tags using below
HtmlDocument document = new HtmlDocument();
document.Load(downloadedHtml);
XPathNavigator n = document.CreateNavigator();
XPathNodeIterator scriptTags = n.Select("//script");
foreach (XPathNavigator nav in scriptTags)
{
string innerXml = nav.InnerXml;
// Parse inner xml using regex
}
Upvotes: 1