Reputation: 1201
I'm running a script using Nokogiri that returns multiple values. I was under the impression (and reassured by multiple sources) that the results should be in the form of an array. Instead I'm getting an ugly looking string. Here is the code
require 'nokogiri'
require 'open-uri'
require 'spreadsheet'
profile_page_scraper = Nokogiri::HTML(open('http://www.crunchbase.com/company/facebook'))
puts profile_page_scraper.css('div.col1_content td.td_left').text
Which returns this:
PublicDateRaisedPost IPO ValuationWebsiteBlogTwitterCategoryEmployeesFoundedDescription
I know I can use map
to fix this quickly, but I am confused as to why this isn't returning an array. It should, theoretically, return something like this:
["Public", "Date", "Raised" ... "Description"]
Any ideas why this isn't working?
Upvotes: 3
Views: 2184
Reputation: 67850
NodeSet#text
always returns a string (otherwise it would probably be called NodeSet#texts
). Nokogiri docs are not so great, when in doubt check the source code:
# lib/nokogiri/xml/node_set.rb
def inner_text
collect{|j| j.inner_text}.join('')
end
alias :text :inner_text
To get an array of texts: nodes.map(&:text)
Upvotes: 7
Reputation: 22697
The css
method is returning an instance of Nokogiri::XML::NodeSet
, which includes Enumerable
.
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open('http://www.crunchbase.com/company/facebook'))
nodes = doc.css('div.col1_content td.td_left')
nodes.class.ancestors
# => [Nokogiri::XML::NodeSet, Enumerable, Object, Kernel, BasicObject]
So you can use all the standard iterators in conjunction with the content
attribute of each element in this result set. For example:
nodes.each { |n| puts n.content }
Upvotes: 1