Reputation: 764
I have been trying for quite a while but this is my case;
My friend's web application runs a website with quite simple HTML to generate data for charts. I want to get certain values from a table on that page as he requires this information to be stored to a database.
So this is a part of the HTML table;
...
<tr>
<td width=30 align=center bgcolor=#006699 class=W><font color=white>1</font></td>
<td width=50 bgcolor=#FFFFFF align=center>7387</td>
<td width=30 height=25 align=center bgcolor=#006699 class=W><font color=white>2</font></td>
<td width=50 bgcolor=#FFFFFF align=center>2881</td>
<td width=30 height=25 align=center bgcolor=#006699 class=W><font color=white>3</font></td>
<td width=50 bgcolor=#FFFFFF align=center>8782</td>
<td width=30 height=25 align=center bgcolor=#006699 class=W><font color=white>4</font></td>
<td width=50 bgcolor=#FFFFFF align=center>5297</td>
<td width=30 height=25 align=center bgcolor=#006699 class=W><font color=white>5</font></td>
<td width=50 bgcolor=#FFFFFF align=center>749</td>
</tr>
<tr>
<td align=center bgcolor=#006699 class=W><font color=white>6</font></td>
<td width=50 bgcolor=#FFFFFF align=center>3136</td>
<td height=25 align=center bgcolor=#006699 class=W><font color=white>7</font></td>
<td width=50 bgcolor=#FFFFFF align=center>8768</td>
<td height=25 align=center bgcolor=#006699 class=W><font color=white>8</font></td>
<td width=50 bgcolor=#FFFFFF align=center>9548</td>
<td height=25 align=center bgcolor=#006699 class=W><font color=white>9</font></td>
<td width=50 bgcolor=#FFFFFF align=center>6565</td>
<td height=25 align=center bgcolor=#006699 class=W><font color=white>10</font></td>
<td width=50 bgcolor=#FFFFFF align=center>142</td>
</tr>
...
What I want to achieve is;
td
(as shown above) containing the numbers.td
.The output of this would be 1=7387
and 8=9548
.
I got stuck quite fast after trying to find the two td
containing the given numbers.
My C# code so far;
using (WebClient webClient = new WebClient())
{
string completeHTMLCode = webClient.DownloadString("someUrl.php?getChartData=" + chartId);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(completeHTMLCode);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//td[@...]"))
{
}
}
Am I trying something impossible here?
Upvotes: 1
Views: 1021
Reputation: 13723
Well, if you only had this table data to work with it could be parsed using HTMLAgilityPack.
The first thing I'd do is do away with foreach to iterate through the tds, I'd use a counter, then use the counter id as an indexer. The code could look like this
for(int i = 1;i <= selectednodes.Count();i++)
{
if(selectednodes[i-1].InnerHtml.Contains("font")
{
if(selectednodes[i-1].FirstChild.Value == "1" || selectednodes[i-1].FirstChild.Value == "8")
{
myNodecollection.Add(selectednodes[i])
}
}
}
Upvotes: 0
Reputation: 854
You can just parse it into a dictionary and look it up that way. I could think of perhaps some better ways to parse it, but this does what you want.
void Main()
{
string html = @"<tr>
<td width=30 align=center bgcolor=#006699 class=W><font color=white>1</font></td>
<td width=50 bgcolor=#FFFFFF align=center>7387</td>
<td width=30 height=25 align=center bgcolor=#006699 class=W><font color=white>2</font></td>
<td width=50 bgcolor=#FFFFFF align=center>2881</td>
<td width=30 height=25 align=center bgcolor=#006699 class=W><font color=white>3</font></td>
<td width=50 bgcolor=#FFFFFF align=center>8782</td>
<td width=30 height=25 align=center bgcolor=#006699 class=W><font color=white>4</font></td>
<td width=50 bgcolor=#FFFFFF align=center>5297</td>
<td width=30 height=25 align=center bgcolor=#006699 class=W><font color=white>5</font></td>
<td width=50 bgcolor=#FFFFFF align=center>749</td>
</tr>
<tr>
<td align=center bgcolor=#006699 class=W><font color=white>6</font></td>
<td width=50 bgcolor=#FFFFFF align=center>3136</td>
<td height=25 align=center bgcolor=#006699 class=W><font color=white>7</font></td>
<td width=50 bgcolor=#FFFFFF align=center>8768</td>
<td height=25 align=center bgcolor=#006699 class=W><font color=white>8</font></td>
<td width=50 bgcolor=#FFFFFF align=center>9548</td>
<td height=25 align=center bgcolor=#006699 class=W><font color=white>9</font></td>
<td width=50 bgcolor=#FFFFFF align=center>6565</td>
<td height=25 align=center bgcolor=#006699 class=W><font color=white>10</font></td>
<td width=50 bgcolor=#FFFFFF align=center>142</td>
</tr>";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
int[] nodes = doc.DocumentNode.SelectNodes("//td").Select ( dn =>
int.Parse(dn.InnerHtml.Contains("font") ? dn.FirstChild.InnerHtml : dn.InnerHtml)
).ToArray();
Dictionary<int,int> d = new Dictionary<int,int>();
for (int i = 0; i < nodes.Length; i+=2)
d.Add(nodes[i],nodes[i+1]);
d.Dump();
d[1].Dump();
d[8].Dump();
}
Upvotes: 1
Reputation: 13402
I made a quick CsQuery sample how to accomplish this.
string file = File.ReadAllText("a.html"); // gets the html
CQ dom = file; // initializes csquery
CQ td = dom["td"]; // get all td files
td.Each((i,e) => { // go through each
if (e.FirstChild != null) // if element has child (font)
{
if (e.FirstChild.NodeType != NodeType.TEXT_NODE) // ignore text node
{
if (e.FirstChild.InnerText == "1") // if number is 1
{
Console.WriteLine(e.NextElementSibling.InnerText); // output the text
}
if (e.FirstChild.InnerText == "8") // etc etc
{
Console.WriteLine(e.NextElementSibling.InnerText);
}
}
}
});
Console.ReadKey();
Upvotes: 3