Reputation: 5389
I'm a newbie to programmer so excuse my noviceness. So I'm using Nokogiri to scrape a police crime log. Here is the code below:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://www.sfsu.edu/~upd/crimelog/index.html"
doc = Nokogiri::HTML(open(url))
puts doc.at_css("title").text
doc.css(".brief").each do |brief|
puts brief.at_css("h3").text
end
I used the selector gadget bookmarklet to find the CSS selector for the log (.brief). When I pass "h3" through brief.at_css I get all of the h3 tags with the content inside.
However, if I add the .text method to remove the tags, I get NoMethod error.
Is there any reason why this is happening? What am I missing? Thanks!
Upvotes: 5
Views: 8403
Reputation: 38728
To clarify if you look at the structure of the HTML source you will see that the very first occurrence of <div class="brief">
does not have a child h3
tag (it actually only has a child <p>
tag).
The Nokogiri Docs say that
at_css(*rules)
Search this node for the first occurrence of CSS rules. Equivalent to css(rules).first See Node#css for more information.
If you call at_css(*rules)
the docs states it is equivalent to css(rules).first
. When there are items (your .brief
class contains a h3
) then an Nokogiri::XML::Element
object is returned which responds to text
, whereas if your .brief
does not contain a h3
then a NilClass
object is returned, which of course does not respond to text
So if we call css(rules)
(not at_css
as you have) we get a Nokogiri::XML::NodeSet
object returned, which has the text()
method defined as (notice the alias
)
# Get the inner text of all contained Node objects
def inner_text
collect{|j| j.inner_text}.join('')
end
alias :text :inner_text
because the class is Enumerable
it iterates over it's children calling their inner_text
method and joins them all together.
Therefore you can either perform a nil?
check or as @floatless correctly stated just use the css
method
Upvotes: 8
Reputation: 13438
You just need to replace at_css
with css
and everything should be okay.
Upvotes: 4