
Reputation: 2085

Nokogiri check if value exists

I'm scraping some web stuff and I get the following error,

scrape.rb:27:in block in <main>': undefined methodtext' for nil:NilClass (NoMethodError)

when running my ruby task, due to the css not containing any content inside.

Is there a way to check if CSS is not undefined so it wont stop crawling? My code wont work :(

products.each do |product|

     web = Nokogiri::HTML(open(product))

      counter = products.index(product)

      if web.at_css('.entry-title').text != undefined
      puts "CSS content is not undefined"
      puts "Error"

Upvotes: 2

Views: 1846

Answers (2)


Reputation: 35

I agree at_css & IF is the best solution for testing a class existence. Here's an example I whipped up..

user_agents = ["Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:43.0) Gecko/20100101 Firefox/43.0",
            "Mozilla/5.0 (compatible; Konqueror/3; Linux)",             
            "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0",
            "Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4a) Gecko/20030401",
            "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; de-at) AppleWebKit/531.21.8 (KHTML, like Gecko) Version/4.0.4 Safari/531.21.10",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.125 Safari/537.36",
            "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)",
            "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
            "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
            "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)",
            "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)",
            "Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko", 
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.10586",
            "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6",
            "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0",
            "Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B5110e Safari/601.1",
            "Mozilla/5.0 (iPad; CPU OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1",
            "Mozilla/5.0 (Linux; Android 5.1.1; Nexus 7 Build/LMY47V) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.76 Safari/537.36"]
user_agent = user_agents.sample

good_2_go = ""
my_bad = ""

crawls = []
crawls.push(good_2_go, my_bad)

crawls.each do |crawl|
  doc = Nokogiri::HTML(open(crawl, 'User-Agent' => user_agent).read, nil, 'utf-8')

  entries = doc.at_css('.ml-item')

  if entries
      puts crawl
      puts "Found entries class, proceeding with scrape.."
      puts crawl
      puts "Could not find base class for entries"

This will STDOUT ...

   Found entries class, proceeding with scrape..
   Could not find base class for entries    

Upvotes: 0

6ft Dan
6ft Dan

Reputation: 2445

You can just IF the object result before calling text

result = web.at_css('.entry-title')
if result
  puts "CSS content is not undefined"
  puts result.text
  puts "Error"

Upvotes: 3

Related Questions