emanuele
emanuele

Reputation: 2589

How to avoid to launch firefox gui during a scraping of web page with javascript

I am trying to scrape a web page with a lot of javascript. with the help of pguardiano i have this piece of code in ruby.

 require 'rubygems'
 require 'watir-webdriver'
 require 'csv'
 @browser = Watir::Browser.new
 @browser.goto 'http://www.oddsportal.com/matches/soccer/'
 CSV.open('out.csv', 'w') do |out|
 @browser.trs(:class => /deactivate/).each do |tr|
    out << tr.tds.map(&:text)
 end
 end

The scraping is done recursively in background with a sleep time of 1 hour approximatively. I have no experience of ruby and in particular of web scraping, so i have a couple of questions.

  1. How can i avoid that every time a new firefox session is opened with a lot of cpu and ram consumption?

  2. Is it possible to use a firefox engine without using his GUI?

Upvotes: 1

Views: 892

Answers (1)

Dave McNulla
Dave McNulla

Reputation: 2016

You can try a headless option.

require 'watir-webdriver'
require 'headless'
headless = Headless.new
headless.start
b = Watir::Browser.start 'www.google.com'
puts b.title
b.close
headless.destroy

An alternative is to use the selenium server. A third alternative is to use a scraper like Kapow.

Upvotes: 2

Related Questions