calf
calf

Reputation: 881

Watir-webdriver is progressing through script before Nokogiri finishes scraping

There are three forms on the page. All forms default to "today" for their date ranges. Each form is iteratively submitted with a date from a range (1/1/2013 - 1/3/2013, for example) and the resulting table is scraped.

The script then submit the date to the next form in line and again, the table is scraped. However, the scraping is occurring before the dates are submitted.

I tried adding sleep 2 in between scrapes to no avail.

The script is here: https://gist.github.com/hnanon/de4801e460a31d93bbdc

Upvotes: 1

Views: 736

Answers (1)

Justin Ko
Justin Ko

Reputation: 46846

The script appears to assume that Nokogiri and Watir will always be in sync. This is not correct.

When you do:

page = Nokogiri::HTML.parse(browser.html)

Nokogiri gets the browser html at that one specific point in time. If Watir makes a change to the browser (ie changes the html), Nokogiri will not know about it.

Each time you want to parse the html with Nokogiri, you need to create a new Nokogiri object using the browser's latest html.

An example to illustrate:

require 'watir-webdriver'
require 'nokogiri'

b = Watir::Browser.new

b.goto 'www.google.ca'
page = Nokogiri::HTML.parse(b.html)
p page
#=> This will be the Google page

b.goto 'www.yahoo.ca'
p page
#=> This will still be the Google page

page = Nokogiri::HTML.parse(b.html)
p page
#=> This will now be the Yahoo page

Upvotes: 3

Related Questions