Reputation: 110
in my scenario i'm scraping a site whose content is generated by javascript. I know that watir is perfect for my situation, but as we all know, it tends to add a little overhead and make the program take a little longer.
Currently I am able to login using Nokogiri and sending a post request with username/password, as you can imagine this is really fast. After successful login I then go to an address where the content I want to scrape is located, and as I said before, such content is the result of some javascript processing, so Nokogiri is no good from this point.
What I want to do is use Watir to load the html already returned by Nokogiri so it parses all the javascript and then continue scraping the generated HTML without having to use Watir from the start to reduce processing time.
Is there a way to load content into Watir from an html String so it gets processed instead of invoking the 'goto' method?
Upvotes: -1
Views: 515
Reputation: 39695
You can always try.
require 'open-uri'
require 'nokogiri'
require 'watir'
#previous stuff
`touch temp.html`
File.open("temp.html", "w") {|f| f.write(nokodoc.html)}
b = Watir::Browser.new
b.goto("file://#{Dir.pwd}/temp.html")
Upvotes: 0
Reputation: 4194
The direct answer to your question is no. Watir is not designed to scrape web pages, but to test web pages. Testing web pages means navigating to them and interacting with them.
Additionally, if your html parser does not solve your problem, then copying text from that parser into Watir wouldn't fix your problem either.
Upvotes: 0