GN.
GN.

Reputation: 9909

How to handle NILs with Anemone / Nokogiri web scraper?

def scrape!(url)   
  Anemone.crawl(url) do |anemone|   
     anemone.on_pages_like %[/events/detail/.*] do |page|   
      show = {   
        headliner: page.doc.at_css('h1.summary').text,   
        openers: page.doc.at_css('.details h2').text
       }   
      puts show   
    end   
  end   
end    

Writing a scraper in Anemone, which uses Nokogiri under the hood..

Sometime the selector .details h2'returns nothing because its not in the HTML, and calling text on it throws an exception.

I'd like to avoid if/elses all over the place...

   if page.doc.at_css('.details h2').empty?   
      openers: page.doc.at_css('.details h2').text
   end

Is there any more eloquent way of handling errors produced by inconsistant mark up? For instance CoffeeScript has the existentional operator person.name?.first(). If the HTML has the element, great make the object and call text on it. If not, move on and dont add it to the hash.

Upvotes: 0

Views: 142

Answers (1)

Hugo Sousa
Hugo Sousa

Reputation: 916

You just need do:

anemone.on_pages_like %[/events/detail/.*] do |page|   
      if not page.nil?
         ...#your code
      end
end

Upvotes: 0

Related Questions