Ninja2k
Ninja2k

Reputation: 879

Nokogiri multiple domains

Is it possible to do multi domain searches using Nokogiri. I am aware you can do multiple Xpath/CSS searches for a single domain/page but multi domain?

For example I want to scrape http://www.asus.com/Notebooks_Ultrabooks/S56CA/#specifications and http://www.asus.com/Notebooks_Ultrabooks/ASUS_TAICHI_21/#specifications

My Code

require 'nokogiri'
require 'open-uri'
require 'spreadsheet'

doc = Nokogiri::HTML(open("http://www.asus.com/Notebooks_Ultrabooks/ASUS_TAICHI_21/#specifications"))

#Grab our product specifications
data = doc.css('div#specifications div#spec-area ul.product-spec li')

#Modify our data
lines = data.map(&:text)

#Create the Spreadsheet
Spreadsheet.client_encoding = 'UTF-8'
book = Spreadsheet::Workbook.new

sheet1 = book.create_worksheet
sheet1.name = 'My First Worksheet'

#Output our data  to the Spreadsheet
lines.each.with_index do |line, i|                                                        
  sheet1[i, 0] = line                                                                     
end    

book.write 'C:/Users/Barry/Desktop/output.xls'

Upvotes: 0

Views: 232

Answers (1)

the Tin Man
the Tin Man

Reputation: 160551

Nokogiri has no concept of URLs, it only knows about a String or IO stream of XML or HTML. You're confusing OpenURI's purpose with Nokogiri's.

If you want to read from multiple sites, simply loop over the URLs, and pass the current URL to OpenURI to open the page:

%w[
  http://www.asus.com/Notebooks_Ultrabooks/S56CA/#specifications 
  http://www.asus.com/Notebooks_Ultrabooks/ASUS_TAICHI_21/#specifications
].each do |url|

  doc = Nokogiri::HTML(open(url))
  # do somethng with the document...
end

OpenURI will read the page, and pass its contents to Nokogiri for parsing. Nokogiri will still only see one page at a time, because that's all it is passed by OpenURI.

Upvotes: 2

Related Questions