Reputation: 11
My question is very similar to this one XmlNode.SelectSingleNode syntax to search within a node in C#
I'm trying to use HTML Agility Pack to pull price/condition/ship price... Here's the URL I am scraping: http://www.amazon.com/gp/offer-listing/0470108541/ref=dp_olp_used?ie=UTF8&condition=all
Here's a snippet of my code:
string results = "";
var w = new HtmlWeb();
var doc = w.Load(url);
var nodes = doc.DocumentNode.SelectNodes("//div[@class='a-row a-spacing-medium olpOffer']");
if (nodes != null)
{
foreach (HtmlNode item in nodes)
{
var price = item.SelectSingleNode(".//span[@class='a-size-large a-color-price olpOfferPrice a-text-bold']").InnerText;
var condition = item.SelectSingleNode(".//h3[@class='a-spacing-small olpCondition']").InnerText;
var price_shipping = item.SelectSingleNode("//span[@class='olpShippingPrice']").InnerText;
results += "price " + price + " condition " + condition + " ship " + price_shipping + "\r\n";
}
}
return results;
No matter what combination I try of .// and . and ./ and / etc... I cannot get what I want (just now trying to learn xpaths), also currently it is returning just the 1st item over and over and over, just like the original question I referenced earlier. I think I'm missing a fundamental understanding of how selecting nodes work and/or what is considered a node.
UPDATE
Ok, I've changed the URL to point to a different book and the first two items are working as expected... When I try to change the third item (price_shipping) to a ".//" Absolutely no information is being pulled from anything. This must be due to sometime there is not even a shipping price and that span is omitted. How do I handle this? I tried if price_shipping !=null.
UPDATE
Solved. I removed the ".InnerText" from the price_shipping that causing issues when it was null... then I did the null check and Then it was safe to use .InnerText.
Upvotes: 1
Views: 10640
Reputation: 11
Solved. I removed the ".InnerText" from the price_shipping that causing issues when it was null... then I did the null check and Then it was safe to use .InnerText.
Upvotes: 0