Reputation: 97
I'm having some trouble finding the exact way to parse for links from a site. Using firebug, the table's exact xPath is :
/html/body/div/form/table/tbody/tr/td/table/tbody/tr/td/table/tbody/tr[1]/td/table/tbody/tr/td/table/tbody/tr[2]/td/table/tbody/tr[1]/td/div/table/tbody/tr[3]/td/div/table/tbody/tr/td/div/table
It also has an id ='ctl00_cp1_GridView1' (which hasn't been exactly helpful).
All I want to do is find all of the links in the first and add them to a list.
Here's my current code snippet (with some help from this post:
protected void btnSubmitURL_Click(object sender, EventArgs e)
{
try
{
List<string> siteList = new List<string>();
int counter = 1;
var web = new HtmlWeb();
var doc = web.Load(txtURL.Text);
var table = doc.DocumentNode.SelectSingleNode("html/body/div/form/table/tbody/tr/td/table/tbody/tr/td/table/tbody/tr[1]/td/table/tbody/tr/td/table/tbody/tr[2]/td/table/tbody/tr[1]/td/div/table/tbody/tr[3]/td/div/table/tbody/tr/td/div/table[@id='ctl00_cp1_GridView1']/tbody");
HtmlNodeCollection rows = table.SelectNodes("./tr");
if (rows != null)
{
for (int i = 0; i < rows.Count; i++)
{
HtmlNodeCollection cols = rows[i].SelectNodes("./td[1]");
if (cols != null)
{
for (int j = 0; j < cols.Count; j++)
{
HtmlNode aTags = cols[i].SelectSingleNode("./a[@id='NormalColoredFont']");
if (aTags != null)
{
siteList.Add(counter + ". " + aTags.InnerHtml + " - " + aTags.Attributes["href"].Value);
}
}
}
}
}
lblOutput.Text = siteList.Count.ToString();
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
}
I keep getting an Null Exception error out right at the HtmlNodeCollection rows because it can't find that specific table. I've tried searching via the table id but that hasn't helped either.
Any help with getting to that table would be appreciated.
Upvotes: 0
Views: 1357
Reputation: 97
I was able to finally extract all of the links using the example used from Scott Mitchell. His example is as followed:
var linksOnPage = from lnks in document.DocumentNode.Descendants()
where lnks.Name == "a" &&
lnks.Attributes["href"] != null &&
lnks.InnerText.Trim().Length > 0
select new
{
Url = lnks.Attributes["href"].Value,
Text = lnks.InnerText
};
Thanks to jessehouwing and casperOne for responding quickly!
Upvotes: 2