Reputation: 52
A RSS feed that I need to parse works in the way of putting all the information into the description element in CSS to make a nice table in a viewer, this makes it hard to parse the actual strings from it. E.g. below is one of the description elements:
<table style="border-collapse: collapse; border-spacing: 0; color:#493800; font-size: 11px; border:solid 1px #bababa; margin: 10px;"><tr><th style="padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;">Start Time</th><td style="padding:5px; margin:0; background:#fff;">21/11/2013 19:30 UTC</td></tr><tr><th style="padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;">Backup Job</th><td style="padding:5px; margin:0; background:#fff;">Backup</td></tr><tr><th style="padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;">Computer</th><td style="padding:5px; margin:0; background:#fff;">theComputer</td></tr><tr><th style="padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;">Disk</th><td style="padding:5px; margin:0; background:#fff;">theDisk</td></tr><tr><th style="padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;">Username</th><td style="padding:5px; margin:0; background:#fff;">theUsername</td></tr><tr><th style="padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;">Searched</th><td style="padding:5px; margin:0; background:#fff;">112306 (52.5 GB)</td></tr><tr><th style="padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;">Uploaded</th><td style="padding:5px; margin:0; background:#fff;">121 (29.1 MB)</td></tr><tr><th style="padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;">Duration</th><td style="padding:5px; margin:0; background:#fff;">0:19:23</td></tr><tr><th style="padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;">Errors</th><td style="padding:5px; margin:0; background:#fff;">0</td></tr><tr><th style="padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;">Result</th><td style="padding:5px; margin:0; background:#fff;">COMPLETE</td></tr></table> <p><a href="LINK">Details</a></p
Inside all the CSS is various things like Computer: Computername, Uploaded: Amountuploaded and I need to get this stuff but have no idea how too, I have tried using HTML agility pack but could not get it to work, but I am pretty bad with it.
Any help would be really appreciated, thanks.
Upvotes: 1
Views: 277
Reputation: 1435
http://www.codeproject.com/Articles/169598/Parse-XML-Documents-by-XMLDocument-and-XDocument gives some info on parsing XML content in C#. Yeah, it looks to me like you can just use .NET's Xml objects to parse it.
You need to read about .NET's Xml Document parsing. This article is a good start.
To get the string into an XmlDocument, you just use:
string xTxt = "<table style=\"border-collapse: collapse; border-spacing: 0; color:#493800; font-size: 11px; border:solid 1px #bababa; margin: 10px;\"><tr><th style=\"padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;\">Start Time</th><td style=\"padding:5px; margin:0; background:#fff;\">21/11/2013 19:30 UTC</td></tr><tr><th style=\"padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;\">Backup Job</th><td style=\"padding:5px; margin:0; background:#fff;\">Backup</td></tr><tr><th style=\"padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;\">Computer</th><td style=\"padding:5px; margin:0; background:#fff;\">theComputer</td></tr><tr><th style=\"padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;\">Disk</th><td style=\"padding:5px; margin:0; background:#fff;\">theDisk</td></tr><tr><th style=\"padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;\">Username</th><td style=\"padding:5px; margin:0; background:#fff;\">theUsername</td></tr><tr><th style=\"padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;\">Searched</th><td style=\"padding:5px; margin:0; background:#fff;\">112306 (52.5 GB)</td></tr><tr><th style=\"padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;\">Uploaded</th><td style=\"padding:5px; margin:0; background:#fff;\">121 (29.1 MB)</td></tr><tr><th style=\"padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;\">Duration</th><td style=\"padding:5px; margin:0; background:#fff;\">0:19:23</td></tr><tr><th style=\"padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;\">Errors</th><td style=\"padding:5px; margin:0; background:#fff;\">0</td></tr><tr><th style=\"padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;\">Result</th><td style=\"padding:5px; margin:0; background:#fff;\">COMPLETE</td></tr></table><p><a href=\"LINK\">Details</a></p>";
XmlDocument d = new XmlDocument();
d.LoadXml("<?xml version=\"1.0\" encoding=\"UTF-8\"?><root>" + xTxt + "</root>");
string t = null;
XmlNodeList trNodes = d.SelectNodes("//tr");
foreach (XmlNode n in trNodes)
{
XmlNode thNode = n.SelectSingleNode("th");
XmlNode tdNode = n.SelectSingleNode("td");
t += thNode.InnerText + ':';
t += tdNode.InnerText + Environment.NewLine;
}
txtInfo.AppendText("nodes.Count = " + nodes.Count + '\n');
txtInfo.AppendText(t);
notice that each item that you want is in a TR HTML element with the name of the item in a TH elemement and the value in a TD element. that helps you find them easily. so we grab all 10 'tr' elements in trNodes with the above code.
in the above example, i have a TextBox
named txtInfo that i use to see my results. but i encourage you to not even store the results in a string variable. my use of the t
string variable is simply so you can see one way to convert the items to another form. of course, those thNode.InnerText and tdNode.InnerText methods are what grabs each item.
you may want to create a List items, or maybe better you might want to create a class that specifically has each property, but i don't know if each property will change. but you could create a class which does all this processing and use that class in your project. whatever you want. :)
happy coding!
Upvotes: 1