Reputation: 13
I'm trying to develop a scraper to pull in content from NewEgg. I installed Nokogiri on Ruby on Rails and as far as I can tell it's working. However, I'm having difficulty pulling in a specific element that holds the pricing information and I'm not entirely sure why it isn't working. The code below should look for the list class "price-current " and put every instance of that code. Instead, I get no results.
require 'rubygems'
require 'open-uri'
require 'nokogiri'
page = Nokogiri::HTML(open("http://www.newegg.com/Product/Product.aspx?Item=N82E16820313436"))
page.xpath('//li[@class="price-current "]').each do |item|
puts item
end
I've been tearing my hair out for the last two hours trying to figure this out with no success. Any insight would be much appreciated!
EDIT: So, @MarkReed was right about the information I'm looking for being generated by JS. Looking through the code, there appears to be a lot of detail that's in a hash. Is it possible to use RegEx in Nokogiri to pull that information?
var utag_data = {
page_breadcrumb:'Home > Computer Hardware > Memory > Desktop Memory > Team Group > Item#:N82E16820313436',
page_tab_name:'Computer Hardware',
product_category_id:['17'],
product_category_name:['Memory'],
product_subcategory_id:['147'],
product_subcategory_name:['Desktop Memory'],
product_id:['20-313-436'],
product_web_id:['N82E16820313436'],
product_title:['Team Zeus Yellow 8GB (2 x 4GB) 240-Pin DDR3 SDRAM DDR3 1600 (PC3 12800) Desktop Memory Model TZYD38G1600HC9DC01'],
product_manufacture:['Team Group'],
product_unit_price:['79.99'],
product_sale_price:['66.99'],
product_default_shipping_cost:['0.01'],
product_type:['Newegg'],
product_model:['TZYD38G1600HC9DC01'],
product_instock:['1'],
product_group_id:['0'],
page_type:'Product',
site_region:'USA',
site_currency:'USD',
page_name:'ProductDetail',
search_scope:jQuery('#haQuickSearchStore option:selected').text(),
user_nvtc:Web.StateManager.Cookies.get(Web.StateManager.Cookies.Name.NVTC),
user_name:Web.StateManager.Cookies.get(Web.StateManager.Cookies.Name.LOGIN,'LOGINID6'),
third_party_render:['3cb31f7b6faf223eb237af8c737abcebce803020','4774d6780334a7bf9c3c95255c60401916d07cae','e3770e5b640207523c7ac0afed2237ce2f79cd27','9c3638f897ed4a655fd0bd839f04e1c412d54bff','78b8b16d9d0f6f2e8419ac12fa710f5153f1cee3','65531e14b4d9b9a223cc3bfcb65ce7b5f356011d','2a5e772a0f941c862180037f8a5c118c7abf2f7d','9011adc5233493f5adc5f0f0f1bcb655892c09e3']
};
Upvotes: 1
Views: 138
Reputation: 95242
You appear to be searching for DOM elements which are dynamically added by Javascript in the browser after the page loads. They do not exist in the HTML originally fetched from the URL, and so are not accessible to Nokogiri.
Upvotes: 1