Reputation: 881
There are three forms on the page. All forms default to "today" for their date ranges. Each form is iteratively submitted with a date from a range (1/1/2013 - 1/3/2013, for example) and the resulting table is scraped.
The script then submit the date to the next form in line and again, the table is scraped. However, the scraping is occurring before the dates are submitted.
I tried adding sleep 2
in between scrapes to no avail.
The script is here: https://gist.github.com/hnanon/de4801e460a31d93bbdc
Upvotes: 1
Views: 736
Reputation: 46846
The script appears to assume that Nokogiri and Watir will always be in sync. This is not correct.
When you do:
page = Nokogiri::HTML.parse(browser.html)
Nokogiri gets the browser html at that one specific point in time. If Watir makes a change to the browser (ie changes the html), Nokogiri will not know about it.
Each time you want to parse the html with Nokogiri, you need to create a new Nokogiri object using the browser's latest html.
An example to illustrate:
require 'watir-webdriver'
require 'nokogiri'
b = Watir::Browser.new
b.goto 'www.google.ca'
page = Nokogiri::HTML.parse(b.html)
p page
#=> This will be the Google page
b.goto 'www.yahoo.ca'
p page
#=> This will still be the Google page
page = Nokogiri::HTML.parse(b.html)
p page
#=> This will now be the Yahoo page
Upvotes: 3