Reputation: 55273
I have stuff like this:
<div class="new-faceout p13nimp" id="purchase_B005ZVWBGK" data-asin="B005ZVWBGK" data-ref="pd_sim_hg_1">
I'm scraping its data like this:
product_product = @data.css('#purchaseShvl')
product_product.css('.shoveler-cell').each do | product_product |
product_product_asin = product_product.xpath('.//div[@class="new-faceout"]')
(etc...)
How can I extract the values of data-asin
and data-ref
?
I tried this:
product_product_asin = product_product.xpath('.//div[@class="new-faceout"]/@data-ref').first.value
but the value returns nil
.
Live page: http://www.amazon.com/gp/product/B00BATSB60/
Upvotes: 1
Views: 229
Reputation: 369084
Use Nokogiri::XML::Node#attr
method to get attribute:
>> prd = product_product.at_css('.new-faceout')
>> prd.attr('data-asin')
=> "B005ZVWBGK"
>> prd.attr('data-ref')
=> "pd_sim_hg_1"
You can also use Nokogiri::XML::Node#[]
:
>> prd['data-asin']
=> "B005ZVWBGK"
>> prd['data-ref']
=> "pd_sim_hg_1"
Upvotes: 2
Reputation: 79733
Specifying HTML classes with XPath is a bit tricky. In this case you can’t just use [@class="new-faceout"]
because the actual value of the class
attribute is new-faceout p13nimp
so it doesn’t match. You would need to use something like this:
[contains(concat(' ', @class, ' '), ' new-faceout ')]
as the condition. There are quite a few questions here on Stack Overflow about this, as well as elsewhere on the web.
With Nokogiri you can combine css and XPath to produce a simpler technique, by fetching the node with CSS first, and then using XPath, e.g.
@data.at_css('.new-faceout').at_xpath('./@data-ref')
Or if you have fetched the node with CSS you can use the Nokogiri methods attribute
(or attr
or just []
to fetch the attribute value directly:
@data.at_css('.new-faceout')['data-ref']
Upvotes: 0