Reputation: 3825
I have a block of two HTML elements which look like this:
<div class="a-row">
<a class="a-size-small a-link-normal a-text-normal" href="/Chemical-Guys-CWS-107-Extreme-Synthetic/dp/B003U4P3U0/ref=sr_1_1_sns?s=automotive&ie=UTF8&qid=1504525216&sr=1-1">
<span aria-label="$19.51" class="a-color-base sx-zero-spacing">
<span class="sx-price sx-price-large">
<sup class="sx-price-currency">$</sup>
<span class="sx-price-whole">19</span>
<sup class="sx-price-fractional">51</sup>
</span>
</span>
<span class="a-letter-space"></span>Subscribe & Save
</a>
</div>
And next block of HTML:
<div class="a-row a-spacing-none">
<a class="a-link-normal a-text-normal" href="https://rads.stackoverflow.com/amzn/click/com/B003U4P3U0" rel="nofollow noreferrer">
<span aria-label="$22.95" class="a-color-base sx-zero-spacing">
<span class="sx-price sx-price-large">
<sup class="sx-price-currency">$</sup>
<span class="sx-price-whole">22</span>
<sup class="sx-price-fractional">95</sup>
</span>
</span>
</a>
<span class="a-letter-space"></span>
<i class="a-icon a-icon-prime a-icon-small s-align-text-bottom" aria-label="Prime">
<span class="a-icon-alt">Prime</span>
</i>
</div>
Both of these elements are quite similar in their structure, but the trick is that I want to extract the value of element which next to it contains a span element with a class: aria-label="Prime"
This is how I currently extract the price but it's not good:
if (htmlDoc.DocumentNode.SelectNodes("//span[@class='a-color-base sx-zero-spacing']") != null)
{
var span = htmlDoc.DocumentNode.SelectSingleNode("//span[@class='a-color-base sx-zero-spacing']");
price = span.Attributes["aria-label"].Value;
}
This basically selects HTML element at position 0, since there are more than one element. But the trick here is that I would like to select that span element which contains the prime value , just like the 2nd piece of HTML I've shown... In case the 2nd element with such values doesn't exists I would just simply use this first method I wrote up there...
Can someone help me out with this ? =)
I've also tried something like this:
var pr = htmlDoc.DocumentNode.SelectNodes("//a[@class='a-link-normal a-text-normal']")
.Where(x => x.SelectSingleNode("//i[@class='a-icon a-icon-prime a-icon-small s-align-text-bottom']") != null)
.Select(x => x.SelectSingleNode("//span[@class='a-color-base sx-zero-spacing']").Attributes["aria-label"].Value);
But it's still returning first element xD
New version guys:
var pr = htmlDoc.DocumentNode.SelectNodes("//a[@class='a-link-normal a-text-normal']");
string prrrrrr = "";
for (int i = 0; i < pr.Count; i++)
{
if (pr.ElementAt(i).SelectNodes("//i[@class='a-icon a-icon-prime a-icon-small s-align-text-bottom']").ElementAt(i) != null)
{
prrrrrr = pr.ElementAt(i).SelectNodes("//span[@class='a-color-base sx-zero-spacing']").ElementAt(i).Attributes["aria-label"].Value;
}
}
So the idea is that I take out all "a" elements from the HTML file and create a HTML Node collection of a's, and then loop through them and see which one indeed contains the element that I'm looking for and then match it...?
The problem here is that this if statement always passes:
if (pr.ElementAt(i).SelectNodes("//i[@class='a-icon a-icon-prime a-icon-small s-align-text-bottom']").ElementAt(i) != null)
How can I loop through each individual element in node collection ?
Upvotes: 2
Views: 1450
Reputation: 5832
I think you should start to look at div
level with class a-row
. Then loop and check if the div
contains a i
with class area-label
equals to 'Prime'. And finally get the span
with the a-color-base sx-zero-spacing
class and the value of the attribute aria-label
like this:
HtmlNodeCollection nodes = htmlDoc.DocumentNode.SelectNodes("//div[starts-with(@class,'a-row')]");
foreach (HtmlNode node in nodes)
{
HtmlNode i = node.SelectSingleNode("i[@aria-label='Prime']");
if (i != null)
{
HtmlNode span = node.SelectSingleNode(".//span[@class='a-color-base sx-zero-spacing']");
if (span != null)
{
string currentValue = span.Attributes["aria-label"].Value;
}
}
}
Upvotes: 1