Is there a library than can trudge through AJAX/javascript?

Question

I'm using PHP to scrape some information off webpages, however, I've discovered that the info I'm trying to scrape from the pages is loading through some manner of AJAX/javascript. I thought I remembered that Curl could iterate through the javascript, but I've found that that's not the case.

I seem to remember some sort of backend "web browser" library/function that could trace through javascript and AJAX, to get at a final page result of what a full-functioned browser would arrive at.

Is there a library or function that can do this? Any ideas on how to go about this, other than having to manually trace through the scripts/redirects myself? It doesn't have to be pretty -- I'm just looking to scrape the resulting text.

pguardiario · Accepted Answer

Maybe not in php but in other languages there's: Watir/WatiN, selenium, watir/selenium-webdriver, capybara-webkit, celerity, node.js runs js directly, as well as phantomjs. There's also iMacros and similar commercial options.

But I usually find that I can get the data I want without any of these by just looking at the requests the page is making and recreate them/parsing the response.

Is there a library than can trudge through AJAX/javascript?

Answers (2)

Related Questions