Gibson
Gibson

Reputation: 2085

Nokogiri check if value exists

I'm scraping some web stuff and I get the following error,

scrape.rb:27:in block in <main>': undefined methodtext' for nil:NilClass (NoMethodError)

when running my ruby task, due to the css not containing any content inside.

Is there a way to check if CSS is not undefined so it wont stop crawling? My code wont work :(

products.each do |product|

     web = Nokogiri::HTML(open(product))

      counter = products.index(product)

      if web.at_css('.entry-title').text != undefined
      puts "CSS content is not undefined"
      else
      puts "Error"
      end

Upvotes: 2

Views: 1846

Answers (2)

CookieMonstROR-JS
CookieMonstROR-JS

Reputation: 35

I agree at_css & IF is the best solution for testing a class existence. Here's an example I whipped up..

user_agents = ["Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:43.0) Gecko/20100101 Firefox/43.0",
            "Mozilla/5.0 (compatible; Konqueror/3; Linux)",             
            "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0",
            "Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4a) Gecko/20030401",
            "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; de-at) AppleWebKit/531.21.8 (KHTML, like Gecko) Version/4.0.4 Safari/531.21.10",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.125 Safari/537.36",
            "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)",
            "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
            "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
            "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)",
            "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)",
            "Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko", 
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.10586",
            "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6",
            "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0",
            "Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B5110e Safari/601.1",
            "Mozilla/5.0 (iPad; CPU OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1",
            "Mozilla/5.0 (Linux; Android 5.1.1; Nexus 7 Build/LMY47V) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.76 Safari/537.36"]
user_agent = user_agents.sample

good_2_go = "https://gomovies.to/genre/action/1"
my_bad = "https://gomovies.to/genre/action/100"

crawls = []
crawls.push(good_2_go, my_bad)

crawls.each do |crawl|
  doc = Nokogiri::HTML(open(crawl, 'User-Agent' => user_agent).read, nil, 'utf-8')

  entries = doc.at_css('.ml-item')

  if entries
      puts crawl
      puts "Found entries class, proceeding with scrape.."
  else
      puts crawl
      puts "Could not find base class for entries"
  end
end

This will STDOUT ...

=> https://gomovies.to/genre/action/1
   Found entries class, proceeding with scrape..
   https://gomovies.to/genre/action/100
   Could not find base class for entries    

Upvotes: 0

6ft Dan
6ft Dan

Reputation: 2445

You can just IF the object result before calling text

result = web.at_css('.entry-title')
if result
  puts "CSS content is not undefined"
  puts result.text
else
  puts "Error"
end

Upvotes: 3

Related Questions