Reputation: 20958
I'm trying to scrape a site where I can only rely on classes and element hierarchy to find the right nodes. But using Mechanize::Page#search
returns Nokogiri::XML::Element
s which I can't use to fill and submit forms etc.
I'd really like to use pure CSS selectors but matching for classes seems to be pretty straight forward with the various _with
methods too. However, matching things like :not(.class)
is pretty verbose compared to simply using CSS selectors while I have no idea how to match for element hierarchy.
Is there a way to convert Nokogiri elements back to Mechanize objects or even better get them straight from the search
method?
Upvotes: 4
Views: 5444
Reputation: 20958
Like stated in this answer you can simply construct a new Mechanize::Form
object using your Nokogiri::XML::Element
retrieved via Mechanize::Page#search
or Mechanize::Page#at
:
a = Mechanize.new
page = a.get 'https://stackoverflow.com/'
# Get the search form via ID as a Nokogiri::XML::Element
form = page.at '#search'
# Convert it back to a Mechanize::Form object
form = Mechanize::Form.new form, a, page
# Use it!
form.q = 'Foobar'
result = form.submit
Note: You have to provide the Mechanize
object and the Mechanize::Page
object to the constructor to be able to submit the form. Otherwise it would just be a Mechanize::Form
object without context.
There seems to be no central utility function to convert Nokogiri::XML::Element
s to Mechanize elements but rather the conversions are implemented where they are needed. Consequently, writing a method that searches the document by CSS or XPath and returns Mechanize elements if applicable would require a pretty big switch-case on the node type. Not exactly what I imagined.
Upvotes: 7