Scraping the href value of anchor in Ruby

Question

Working on this project where I have to scrape a "website," which is just a an html file in one of the local folders. Anyway, I've been trying to scrape down to the href value (a url) of the anchor tag for each student object. I am also scraping for other things, so ignore the rest. Here is what I have so far:

def self.scrape_index_page(index_url) #responsible for scraping the index page that lists all of the students
    #return an array of hashes in which each hash represents one student.
    html = index_url
    doc = Nokogiri::HTML(open(html))
    # doc.css(".student-name").first.text
    # doc.css(".student-location").first.text
    #student_card = doc.css(".student-card").first
    #student_card.css("a").text
end

Here is one of the student profiles. They are all the same, so I'm just interested in scraping the href url value.


   
      
         View Profile
      
      
         Eric Chu
         Glenelg, MD

thanks for your help!

Derek Hopper · Accepted Answer

Once you get an anchor tag in Nokogiri, you can get the href like this:

anchor["href"]

So in your example, you could get the href by doing the following:

student_card = doc.css(".student-card").first
href = student_card.css("a").first["href"]

If you wanted to collect all of the href values at once, you could do something like this:

hrefs = doc.css(".student-card a").map { |anchor| anchor["href"] }

Scraping the href value of anchor in Ruby

Answers (1)

Related Questions