wyc
wyc

Reputation: 55273

Scraping values from "data-" with Nokogiri?

I have stuff like this:

<div class="new-faceout p13nimp" id="purchase_B005ZVWBGK" data-asin="B005ZVWBGK" data-ref="pd_sim_hg_1">

I'm scraping its data like this:

 product_product = @data.css('#purchaseShvl')

    product_product.css('.shoveler-cell').each do | product_product |
      product_product_asin  = product_product.xpath('.//div[@class="new-faceout"]')

(etc...)

How can I extract the values of data-asin and data-ref?

I tried this:

 product_product_asin  = product_product.xpath('.//div[@class="new-faceout"]/@data-ref').first.value

but the value returns nil.

Live page: http://www.amazon.com/gp/product/B00BATSB60/

Upvotes: 1

Views: 229

Answers (2)

falsetru
falsetru

Reputation: 369084

Use Nokogiri::XML::Node#attr method to get attribute:

>> prd = product_product.at_css('.new-faceout')

>> prd.attr('data-asin')
=> "B005ZVWBGK"
>> prd.attr('data-ref')
=> "pd_sim_hg_1"

You can also use Nokogiri::XML::Node#[]:

>> prd['data-asin']
=> "B005ZVWBGK"
>> prd['data-ref']
=> "pd_sim_hg_1"

Upvotes: 2

matt
matt

Reputation: 79733

Specifying HTML classes with XPath is a bit tricky. In this case you can’t just use [@class="new-faceout"] because the actual value of the class attribute is new-faceout p13nimp so it doesn’t match. You would need to use something like this:

[contains(concat(' ', @class, ' '), ' new-faceout ')]

as the condition. There are quite a few questions here on Stack Overflow about this, as well as elsewhere on the web.

With Nokogiri you can combine css and XPath to produce a simpler technique, by fetching the node with CSS first, and then using XPath, e.g.

@data.at_css('.new-faceout').at_xpath('./@data-ref')

Or if you have fetched the node with CSS you can use the Nokogiri methods attribute (or attr or just [] to fetch the attribute value directly:

@data.at_css('.new-faceout')['data-ref']

Upvotes: 0

Related Questions