Rhys
Rhys

Reputation: 2877

HTML Parse no results

Am trying to parse this HTML document to get the contents of flight, time, origin, date and output.

<div id="FlightInfo_FlightInfoUpdatePanel">

<table cellspacing="0" cellpadding="0">
<tbody>
    <tr class="">
    <td class="airline"><img src="/images/airline logos/US.gif" title="US AIRWAYS. " alt="US AIRWAYS. " /></td>
    <td class="flight">US5316</td>
    <td class="codeshare">NZ46</td>
    <td class="origin">Rarotonga</td>
    <td class="date">02 Sep</td>
    <td class="time">10:30</td>
    <td class="est">21:30</td>
    <td class="status">CHECK IN CLOSING</td>
    </tr>

I am using this code, based on HTML Agility Pack for windows phone 7 to find and output the content of <td class="flight">US5316</td>

void client_DownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
{
    var html = e.Result;

    var doc = new HtmlDocument();
    doc.LoadHtml(html);


    var node = doc.DocumentNode.Descendants("div")
        .FirstOrDefault(x => x.Id == "FlightInfo_FlightInfoUpdatePanel")
        .Element("table")
        .Element("tbody")
        .Elements("tr")
        .Where(tr => tr.GetAttributeValue("td", "").Contains("class"))
        .SelectMany(tr => tr.Descendants("flight"))
        .ToArray();

    this.scrollViewer1.Content = node;  

   //Added below

   listBox1.itemSource = node;
}

I get no results in either the ScrollViewer or the Listbox. I would like to know if the linq parse that I am using is correct for the HTML I supplied?

Upvotes: 1

Views: 1214

Answers (2)

alf
alf

Reputation: 18550

What do you intend to do with this line?

.Where(tr => tr.GetAttributeValue("td", "").Contains("class"))

GetAttributeValue(name, def) looks for an attribute with the key name in the node, and it returns the value of that attribute in case it founds it. Otherwise, it returns the default value def.

So what's actually happening here is that <tr> doesn't have any attribute with the key td, so it's returning the default value (an empty string), which does not contain the substring "class", so your <tr> node is being filtered out.

Edit: This will return an array where each entry is an array of 8 strings containing the contents of each td:

var node = doc.DocumentNode.Descendants("div")
    .FirstOrDefault(x => x.Id == "FlightInfo_FlightInfoUpdatePanel")
    .Element("table")
    .Element("tbody")
    .Elements("tr")
    .Select(tr => tr.Elements("td").Select(td => td.InnerText).ToArray())
    .ToArray();

Examples:

node[0][1] == "US5316"
node[0][3] == "Rarotonga"

Upvotes: 1

Claus J&#248;rgensen
Claus J&#248;rgensen

Reputation: 26336

You're trying to set the content of a ScrollViewer to a string[] (an array). So I'll repeat myself, and say that you should take some time to learn basic C# before you continue this endeavour.

What you need to do, is to use a ListBox instead of the ScrollViewer and then set the ListBox.ItemSource to your node string-array.

Upvotes: 0

Related Questions