Reputation: 6862
What are the screen scraping options in Rails 3 - gem/library? I have used Nokogiri in the past but just wanted to know if there are better options in Rails 3.
Upvotes: 2
Views: 1479
Reputation: 139
You can also use the Scrapifier gem to get metadata from URIs found in a string. It's very simple to use:
'Wow! What an awesome site: http://adtangerine.com!'.scrapify
#=> {
# title: "AdTangerine | Advertising Platform for Social Media",
# description: "AdTangerine is an advertising platform that uses the tangerine as a virtual currency for advertisers and publishers in order to share content on social networks.",
# images: ["http://adtangerine.com/assets/logo_adt_og.png", "http://adtangerine.com/assets/logo_adt_og.png", "http://s3-us-west-2.amazonaws.com/adtangerine-prod/users/avatars/000/000/834/thumb/275747_1118382211_1929809351_n.jpg", "http://adtangerine.com/assets/foobar.gif"],
# uri: "http://adtangerine.com"
# }
Upvotes: 0
Reputation: 1323
If this is a one-off task or if your target data set is relatively small (under a hundred of pages), use Mechanize (browse & scrape) or Anemone (does whatever Mechanize does + some additional crawling-specific options).
If you need to automate this collection or if you are dealing with large data sets, consider using a web service. Bobik is a good choice in this bucket.
Upvotes: 2
Reputation: 38772
In the fantastic RubyTools website you can find several Ruby libraries to parsing HTML. Still Nokogiri is the most popular.
Upvotes: 0
Reputation: 160551
Rails doesn't do screen scraping. You are free to use Ruby code that would add that functionality, but by itself it does the generation of the pages.
Mechanize, which uses Nokogiri internally, is a good choice, otherwise I always roll my own using Nokogiri and OpenURI.
Upvotes: 1