Reputation: 9
I am parsing prada.com and would like to scrape data in the div class "nextItem" and get its name and price. Here is my code:
require 'rubygems'
require 'mechanize'
require 'nokogiri'
require 'open-uri'
agent = Mechanize.new
page = agent.get('http://www.prada.com/en/US/e-store/department/woman/handbags.html?cmp=from_home')
fp = File.new('prada_prices','w')
html_doc = Nokogiri::HTML(page)
page = html_doc.xpath("//ol[@class='nextItem']")
page.each do {|i| fp.write(i.text + "\n")}
end
I get an error and no output. What I think I am doing is instantiating a mechanize object and calling it agent. Then creating a page variable and assigning it the url provided. Then creating a variable that is a nokogiri object with the mechanize url passed in Then searching the url for all class references that are titled nextItem Then printing all the data contained there
Can someone show me where I might have went wrong?
Upvotes: 0
Views: 2850
Reputation: 1409
Here are the wrong parts:
{}
or do
/end
but not both in the same time.Mechanize#get
returns a Mechanize::Page
which act as a Nokogiri document, at least it has search
, xpath
, css
. Use them instead of trying to coerce the document to a Nokogiri::HTML object.require 'open-uri'
, and require 'nokogiri'
when you are not using them directly.Here is the code with fixes:
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get('http://www.prada.com/en/US/e-store/department/woman/handbags.html?cmp=from_home')
fp = File.new('prada_prices','w')
page = page.search("//ol[@class='nextItem']").each do |i|
fp.write(i.text + "\n")
end
fp.close
Upvotes: 0
Reputation: 8331
Since Prada's website dynamically loads its content via JavaScript, it will be hard to scrape its content. See "Scraping dynamic content in a website" for more information.
Generally speaking, with Mechanize, after you get a page:
page = agent.get(page_url)
you can easily search items with CSS selectors and scrape for data:
next_items = page.search(".fooClass")
next_items.each do |item|
price = item.search(".fooPrice").text
end
Then simply handle the strings or generate hashes as you desire.
Upvotes: 2