Ali Ove
Ali Ove

Reputation: 79

Populating Rails application with scraped content from another site

I need to seed or scrape the data from another site in order to have content for my project.

How do you go about scraping data from another site using your own rails app? Do you use a separate application/server to run some sort of cron job, then add that data to your rails app? Or is it possible to have your own site scrape the data and display it directly?

My first idea was to scrape a site using Mechanize, then add the data to the Fixtures in my rails app as seed data. Is there a better way? Maybe even a way to continuously scrape the other site to display the data using my own rails app?

Upvotes: 2

Views: 279

Answers (3)

MorboRe'
MorboRe'

Reputation: 161

I use Heroku and it comes with something called scheduler that works quite well for my little project. I believe it works very similar to cron.

Heroku Scheduler

Once the data get scraped, it goes directly into database(psql) then you could display whatever you wanted through database query.

Upvotes: 0

Cryptex Technologies
Cryptex Technologies

Reputation: 1163

You can use rufus scheduler and watir-dom-wait gem for your problem solution. I have also done a similar task for scraping for amazon kdp book list fetch by using the watir-dom-wait gem you can also fetch the data for ajax call request the mechanize and Nokogiri will not work for Ajax

require 'rufus-scheduler'
require 'watir-dom-wait'
require 'selenium-webdriver'
scheduler = Rufus::Scheduler.new

scheduler.in '1d' do
  download_report
end
#download the report form amazon kdp
def download_report
  #login
  @browser = Watir::Browser.new :chrome, options: {prefs: prefs}
  @browser.goto 'https://kdp.amazon.com/en_US/reports-new'
  @browser.input(:name => "email").send_keys("[email protected]")
  @browser.input(:name => "password").send_keys("password")
  @browser.input(:id => 'signInSubmit').click
  @browser.span(:text => "Generate Report").click
end

Upvotes: 2

Aymen Bou
Aymen Bou

Reputation: 1109

I use Nokogiri to scrape websites.

You don't need a separate application. You can have methods inside your models that deal with all the scraping and populating your database and then you can create a rake file that will run those functions.

I name mine scheduler.rake

This goes in /lib/tasks/

And then if you're using Heroku you will be able to add the Scheduler plugin (It's available for free 28/12/2018)

Heroku has some pretty good docs explaining how you can configure things on the Heroku side of things.

Upvotes: 0

Related Questions