user1442073
user1442073

Reputation: 37

XML with HTML table to parse with C#

I'm following an RSS feed, which returns an XML. Inside the XML are HTML tables, returned as one long string. I'm trying to access the elements of this HTML table with C#, so that I may use each of these elements as variables for another program. An example of a table:

<table cellpadding="5"><tr><td><strong>Date (GMT)</strong></td><td><strong>Event</strong></td><td><strong>Cons.</strong></td><td><strong>Actual</strong></td><td><strong>Previous</strong></td></tr><tr><td>Jun 7 11:00</td><td>Announcement</td><td>6.250 %</td><td>6.310  %</td><td>6.560  %</td></tr></table>

Just about every similar thread on here has suggested HtmlAgilityPack, which I'm trying to use. So far, I've been able to pull out the HTML table and declare it as a string variable, but I can't seem to be able to pull out the table elements. The following is my hack, based on several users' suggestions:

XmlDocument xDoc = new XmlDocument();
xDoc.Load("http://rssfeed.com");
string descr = xDoc.SelectSingleNode("rss/channel/item/description").InnerText;

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml("descr");
// A Print statement here (textBox1.Text = descr;) shows that I'm successfully accessing the HTML table
var table = doc.DocumentNode.Descendants("tr")
.Select(n => n.Elements("td").Select(o => o.InnerText).ToArray());

foreach (var tr in table)
{
textBox1.Text = String.Format("{0} {1} {2}", tr[0], tr[1], tr[2]);
}

Any and all suggestions are extremely welcome.

Thanks, D

Upvotes: 3

Views: 5950

Answers (1)

Chuck Savage
Chuck Savage

Reputation: 11945

This worked for me, and as long as the Html works as Xml it will for you (and the values are always within a TD). The Value of a TD with a single element inside (aka the strong's) is the same as that element's value.

XElement table = XElement.Parse("<table cellpadding=\"5\"><tr><td><strong>Date (GMT)</strong></td><td><strong>Event</strong></td><td><strong>Cons.</strong></td><td><strong>Actual</strong></td><td><strong>Previous</strong></td></tr><tr><td>Jun 7 11:00</td><td>Announcement</td><td>6.250 %</td><td>6.310  %</td><td>6.560  %</td></tr></table>");
string[] values = table.Descendants("td").Select(td => td.Value).ToArray();

And/or the rows with value arrays:

var rows = table.Elements()
    .Select(tr => tr.Elements().Select(td => td.Value).ToArray())
    .ToList();

Update:

foreach (string value in values)
    Console.WriteLine(value);

foreach (string[] row in rows)
    foreach (string value in row)
        Console.WriteLine(value);

Upvotes: 2

Related Questions