Reputation: 879
Is it possible to do multi domain searches using Nokogiri. I am aware you can do multiple Xpath/CSS searches for a single domain/page but multi domain?
For example I want to scrape http://www.asus.com/Notebooks_Ultrabooks/S56CA/#specifications and http://www.asus.com/Notebooks_Ultrabooks/ASUS_TAICHI_21/#specifications
My Code
require 'nokogiri'
require 'open-uri'
require 'spreadsheet'
doc = Nokogiri::HTML(open("http://www.asus.com/Notebooks_Ultrabooks/ASUS_TAICHI_21/#specifications"))
#Grab our product specifications
data = doc.css('div#specifications div#spec-area ul.product-spec li')
#Modify our data
lines = data.map(&:text)
#Create the Spreadsheet
Spreadsheet.client_encoding = 'UTF-8'
book = Spreadsheet::Workbook.new
sheet1 = book.create_worksheet
sheet1.name = 'My First Worksheet'
#Output our data to the Spreadsheet
lines.each.with_index do |line, i|
sheet1[i, 0] = line
end
book.write 'C:/Users/Barry/Desktop/output.xls'
Upvotes: 0
Views: 232
Reputation: 160551
Nokogiri has no concept of URLs, it only knows about a String or IO stream of XML or HTML. You're confusing OpenURI's purpose with Nokogiri's.
If you want to read from multiple sites, simply loop over the URLs, and pass the current URL to OpenURI to open
the page:
%w[
http://www.asus.com/Notebooks_Ultrabooks/S56CA/#specifications
http://www.asus.com/Notebooks_Ultrabooks/ASUS_TAICHI_21/#specifications
].each do |url|
doc = Nokogiri::HTML(open(url))
# do somethng with the document...
end
OpenURI will read the page, and pass its contents to Nokogiri for parsing. Nokogiri will still only see one page at a time, because that's all it is passed by OpenURI.
Upvotes: 2