ZK Zhao
ZK Zhao

Reputation: 21523

How do I get the HREF attribute of an anchor tag?

I'm trying to scrape the sites from http://expo.getbootstrap.com/

The HTML is like this:

<div class="col-span-4">
  <p>
    <a class="thumbnail" target="_blank" href="https://www.getsentry.com/">
      <img src="/screenshots/sentry.jpg">
    </a>
  </p>
</div>

My Nokogiri code is:

url = "http://expo.getbootstrap.com/"
doc = Nokogiri::HTML(open(url))
puts doc.css("title").text
doc.css(".col-span-4").each do |site|
  title=site.css("h4 a").text
  href = site.css("a.thumbnail")[0]['href']
end  

The goal is simple, get the href, the <img> tag's href, and the site's <title>, but it keeps reporting:

undefined method [] for nil:NilClass 

in the line:

href = site.css("a.thumbnail")[0]['href']

It really drives me crazy because the code I wrote here is actually working in another situation.

Upvotes: 2

Views: 988

Answers (2)

the Tin Man
the Tin Man

Reputation: 160551

I'd do something like:

require 'nokogiri'
require 'open-uri'
require 'pp'

doc = Nokogiri::HTML(open('http://expo.getbootstrap.com/'))

thumbnails = doc.search('a.thumbnail').map{ |thumbnail|
  {
    href: thumbnail['href'],
    src: thumbnail.at('img')['src'],
    title: thumbnail.parent.parent.at('h4 a').text
  }
}

pp thumbnails

Which, after running has:

# => [
  {
    :href => "https://www.getsentry.com/",
    :src => "/screenshots/sentry.jpg",
    :title => "Sentry"
  },
  {
    :href => "http://laravel.com",
    :src => "/screenshots/laravel.jpg",
    :title => "Laravel"
  },
  {
    :href => "http://gruntjs.com",
    :src => "/screenshots/gruntjs.jpg",
    :title => "Grunt"
  },
  {
    :href => "http://labs.bittorrent.com",
    :src => "/screenshots/bittorrent-labs.jpg",
    :title => "BitTorrent Labs"
  },
  {
    :href => "https://www.easybring.com/en",
    :src => "/screenshots/easybring.jpg",
    :title => "Easybring"
  },
  {
    :href => "http://developers.kippt.com/",
    :src => "/screenshots/kippt-developers.jpg",
    :title => "Kippt Developers"
  },
  {
    :href => "http://www.learndot.com/",
    :src => "/screenshots/learndot.jpg",
    :title => "Learndot"
  },
  {
    :href=>"http://getflywheel.com/",
    :src=>"/screenshots/flywheel.jpg",
    :title=>"Flywheel"
}
]

Upvotes: 2

rainkinz
rainkinz

Reputation: 10394

You're not accounting for the fact that not all .col-span-4 divs contain a thumbnail. This should work:

url = "http://expo.getbootstrap.com/"
doc = Nokogiri::HTML(open(url))
puts doc.css("title").text
doc.css(".col-span-4").each do |site|
  title = site.css("h4 a").text
  thumbnail = site.css("a.thumbnail")
  next if thumbnail.empty?
  href = thumbnail[0]['href']
end

Upvotes: 1

Related Questions