Utopia025
Utopia025

Reputation: 1201

Nokogiri returning values as a string, not an array

I'm running a script using Nokogiri that returns multiple values. I was under the impression (and reassured by multiple sources) that the results should be in the form of an array. Instead I'm getting an ugly looking string. Here is the code

require 'nokogiri'
require 'open-uri'
require 'spreadsheet'

profile_page_scraper = Nokogiri::HTML(open('http://www.crunchbase.com/company/facebook'))       
puts profile_page_scraper.css('div.col1_content td.td_left').text

Which returns this:

PublicDateRaisedPost IPO ValuationWebsiteBlogTwitterCategoryEmployeesFoundedDescription

I know I can use map to fix this quickly, but I am confused as to why this isn't returning an array. It should, theoretically, return something like this:

["Public", "Date", "Raised" ... "Description"]

Any ideas why this isn't working?

Upvotes: 3

Views: 2184

Answers (2)

tokland
tokland

Reputation: 67850

NodeSet#text always returns a string (otherwise it would probably be called NodeSet#texts). Nokogiri docs are not so great, when in doubt check the source code:

  # lib/nokogiri/xml/node_set.rb
  def inner_text
    collect{|j| j.inner_text}.join('')
  end
  alias :text :inner_text

To get an array of texts: nodes.map(&:text)

Upvotes: 7

pje
pje

Reputation: 22697

The css method is returning an instance of Nokogiri::XML::NodeSet, which includes Enumerable.

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open('http://www.crunchbase.com/company/facebook'))  
nodes = doc.css('div.col1_content td.td_left')
nodes.class.ancestors
# => [Nokogiri::XML::NodeSet, Enumerable, Object, Kernel, BasicObject]

So you can use all the standard iterators in conjunction with the content attribute of each element in this result set. For example:

nodes.each { |n| puts n.content }

Upvotes: 1

Related Questions