user1128637
user1128637

Reputation: 227

How do I search then parse results on a webpage with Ruby?

How would you use Ruby to open a website and do a search in the search field and then parse the results? For example if I entered something into a search engine and then parsed the results page. I know how to use Nokogiri to find the webpage and open it. I am lost on how to input into the search field and moving forward to the results. Also on the page that I am actually searching I have to click on enter, I can't simply hit enter to move forward. Thank you so much for your help.

Upvotes: 2

Views: 492

Answers (2)

Amadan
Amadan

Reputation: 198556

Use Mechanize - a library used for automating interaction with websites.

Upvotes: 5

Steve S
Steve S

Reputation: 136

Something like mechanize will work, but interacting with the front end UI code is always going to be slower and more problematic than making requests directly against the back end.

Your best bet would be to look at the request that is being made to the server (probably a HTTP GET or POST request with some associated params). You can do this with firebug or Fiddler 2 for windows. Then, once you know the parameters that the server will accept, just make the request yourself.

For example, if you were doing this with the duckduckgo.com search engine, you could either get mechanize to go to duckduckgo.com, input text into the search box, and click submit, or you could just create a GET request to http://www.duckduckgo.com?q=search_term_here.

You can use Mechanize for something like this but it might be overkill. I would take a look at RestClient, especially if you don't need to manage cookies.

Edit:

If you can determine the specific URL that the form submits to, say for example 'example.com/search'; and you knew the request was a POST (which it usually is if you are submitting a form) you could construct something like this with mechanize:

agent = Mechanize.new
agent.post 'http://example.com/search', {
    "_id0:Number" => string_to_search_for, 
    "_id0:submitButton" => "Enter" 
}

Notice how the 'name' attribute of a form element becomes a key for the post and the 'value' element becomes the value. The 'input' element gets the value directly from the text you would have entered. This gets transformed into a request and submitted to the server when you push the submit button (of course in this case you are making the request directly). The result of the post should be some HTML that you can parse for the info you need.

Upvotes: 1

Related Questions