user1427661
user1427661

Reputation: 11774

Getting Dynamically Generated HTML With Nokogiri/Open URI

I'm trying to scrape a site by looking at its HTML in Chrome and grabbing the data using Nokogiri. The problem is that some of the tags are dynamically generated, and they don't appear with an open(url) request when using open-uri. Is there a way to "force" a site to dynamically generate its content for a tool like open uri to read?

Upvotes: 5

Views: 2062

Answers (1)

Chris Heald
Chris Heald

Reputation: 62638

If reading it via open-uri doesn't produce the content you need, then chances are good that the client is generating content with Javascript.

This may be good news - by inspecting the AJAX requests that the page makes, you might find a JSON feed of the content you're looking for, which you can then request and parse directly. This would get you your data without having to dig through the HTML - handy!

If that doesn't work for some reason, though, you're going to need to open the page with some kind of browser, let it execute its clientside Javascript, then dump the resulting DOM to HTML. Something like PhantomJS is an excellent choice for this kind of work.

Upvotes: 4

Related Questions