Reputation: 447
Learning scraping with Ruby. I'm trying to count the amount of outbound links a given page has, but I'm not sure how to tell Ruby I only want the outbound links counted.
My current code:
require "open-uri"
# Collect info
puts "What is your URL?"
url = gets.chomp
puts "Your URL is #{url}"
puts "Loading..."
# Check keyword count
page = open(url).read
link_total = page.scan("</a>")
# obl_count = ???
link_count = link_total.count
puts "Your site has a total of #{link_count} links."
How can I complete this?
Upvotes: 1
Views: 619
Reputation: 9
You can use:
Anemone is a Ruby library that makes it quick and painless to write programs that spider a website. It provides a simple DSL for performing actions on every page of a site, skipping certain URLs, and calculating the shortest path to a given page on a site.
The multi-threaded design makes Anemone fast. The API makes it simple. And the expressiveness of Ruby makes it powerful.
Upvotes: 0
Reputation: 211670
Just as you should never parse HTML with regular expressions, you should probably be using nokogiri to do the dirty work for you.
In simple terms you can use CSS selectors to find tags. From there it's easy to count:
Nokogiri::HTML(page).css('a').length
Upvotes: 2