Bhushan Lodha
Bhushan Lodha

Reputation: 6862

Screen Scraping in Rails 3

What are the screen scraping options in Rails 3 - gem/library? I have used Nokogiri in the past but just wanted to know if there are better options in Rails 3.

Upvotes: 2

Views: 1479

Answers (4)

Tiago G.
Tiago G.

Reputation: 139

You can also use the Scrapifier gem to get metadata from URIs found in a string. It's very simple to use:

'Wow! What an awesome site: http://adtangerine.com!'.scrapify

 #=> {
 #   title:       "AdTangerine | Advertising Platform for Social Media",
 #   description: "AdTangerine is an advertising platform that uses the tangerine as a virtual currency for advertisers and publishers in order to share content on social networks.",
 #   images:      ["http://adtangerine.com/assets/logo_adt_og.png", "http://adtangerine.com/assets/logo_adt_og.png", "http://s3-us-west-2.amazonaws.com/adtangerine-prod/users/avatars/000/000/834/thumb/275747_1118382211_1929809351_n.jpg", "http://adtangerine.com/assets/foobar.gif"],
 #   uri:         "http://adtangerine.com"
 # }

Upvotes: 0

Yevgeniy
Yevgeniy

Reputation: 1323

If this is a one-off task or if your target data set is relatively small (under a hundred of pages), use Mechanize (browse & scrape) or Anemone (does whatever Mechanize does + some additional crawling-specific options).

If you need to automate this collection or if you are dealing with large data sets, consider using a web service. Bobik is a good choice in this bucket.

Upvotes: 2

fguillen
fguillen

Reputation: 38772

In the fantastic RubyTools website you can find several Ruby libraries to parsing HTML. Still Nokogiri is the most popular.

Upvotes: 0

the Tin Man
the Tin Man

Reputation: 160551

Rails doesn't do screen scraping. You are free to use Ruby code that would add that functionality, but by itself it does the generation of the pages.

Mechanize, which uses Nokogiri internally, is a good choice, otherwise I always roll my own using Nokogiri and OpenURI.

Upvotes: 1

Related Questions