aboutaaron
aboutaaron

Reputation: 5389

Nokogiri: Running into error "undefined method ‘text’ for nil:NilClass"

I'm a newbie to programmer so excuse my noviceness. So I'm using Nokogiri to scrape a police crime log. Here is the code below:

require 'rubygems'
require 'nokogiri'
require 'open-uri'

url = "http://www.sfsu.edu/~upd/crimelog/index.html"
doc = Nokogiri::HTML(open(url))
puts doc.at_css("title").text
doc.css(".brief").each do |brief|
 puts brief.at_css("h3").text
end

I used the selector gadget bookmarklet to find the CSS selector for the log (.brief). When I pass "h3" through brief.at_css I get all of the h3 tags with the content inside.

However, if I add the .text method to remove the tags, I get NoMethod error.

Is there any reason why this is happening? What am I missing? Thanks!

Upvotes: 5

Views: 8403

Answers (2)

Paul.s
Paul.s

Reputation: 38728

To clarify if you look at the structure of the HTML source you will see that the very first occurrence of <div class="brief"> does not have a child h3 tag (it actually only has a child <p> tag).

The Nokogiri Docs say that

at_css(*rules)

Search this node for the first occurrence of CSS rules. Equivalent to css(rules).first See Node#css for more information.

If you call at_css(*rules) the docs states it is equivalent to css(rules).first. When there are items (your .brief class contains a h3) then an Nokogiri::XML::Element object is returned which responds to text, whereas if your .brief does not contain a h3 then a NilClass object is returned, which of course does not respond to text

So if we call css(rules) (not at_css as you have) we get a Nokogiri::XML::NodeSet object returned, which has the text() method defined as (notice the alias)

# Get the inner text of all contained Node objects
  def inner_text
    collect{|j| j.inner_text}.join('')
  end
  alias :text :inner_text

because the class is Enumerable it iterates over it's children calling their inner_text method and joins them all together.

Therefore you can either perform a nil? check or as @floatless correctly stated just use the css method

Upvotes: 8

Daniel O&#39;Hara
Daniel O&#39;Hara

Reputation: 13438

You just need to replace at_css with css and everything should be okay.

Upvotes: 4

Related Questions