raphinesse
raphinesse

Reputation: 20958

How can I get Mechanize objects from Mechanize::Page's search method?

I'm trying to scrape a site where I can only rely on classes and element hierarchy to find the right nodes. But using Mechanize::Page#search returns Nokogiri::XML::Elements which I can't use to fill and submit forms etc.

I'd really like to use pure CSS selectors but matching for classes seems to be pretty straight forward with the various _with methods too. However, matching things like :not(.class) is pretty verbose compared to simply using CSS selectors while I have no idea how to match for element hierarchy.

Is there a way to convert Nokogiri elements back to Mechanize objects or even better get them straight from the search method?

Upvotes: 4

Views: 5444

Answers (1)

raphinesse
raphinesse

Reputation: 20958

Like stated in this answer you can simply construct a new Mechanize::Form object using your Nokogiri::XML::Element retrieved via Mechanize::Page#search or Mechanize::Page#at:

a = Mechanize.new
page = a.get 'https://stackoverflow.com/'

# Get the search form via ID as a Nokogiri::XML::Element
form = page.at '#search'

# Convert it back to a Mechanize::Form object
form = Mechanize::Form.new form, a, page

# Use it!
form.q = 'Foobar'
result = form.submit

Note: You have to provide the Mechanize object and the Mechanize::Page object to the constructor to be able to submit the form. Otherwise it would just be a Mechanize::Form object without context.


There seems to be no central utility function to convert Nokogiri::XML::Elements to Mechanize elements but rather the conversions are implemented where they are needed. Consequently, writing a method that searches the document by CSS or XPath and returns Mechanize elements if applicable would require a pretty big switch-case on the node type. Not exactly what I imagined.

Upvotes: 7

Related Questions