Talvi Watia
Talvi Watia

Reputation: 1080

Non-browser emulation of JavaScript - is it possible?

I have a new project I am working on that involves fetching a webpage, (using PHP and cURL) parsing the HTML and javascript out of it and then handling the data in the results.

Basically I hit a brick wall when the site uses javascript to fetch its data by AJAX. In this case, the initial data will not appear in the fetched page unless the javascript is run in a browser.

Are there any PHP libraries for this? (I suspect not, but I could be wrong.)

I would really rather build this as a server-based solution, otherwise I am forced to have to build an application for this and using mozilla and/or IE runtime libraries - which kind of defeats the purpose.

Upvotes: 10

Views: 10273

Answers (8)

Nick Lockwood
Nick Lockwood

Reputation: 40995

All these answers seem to presume that there is no possibility of php JavaScript emulation, but there is a near-fully-compliant open-source php JavaScript emulator here:

http://www.sitepoint.com/blogs/2006/01/19/j4p5-javascript-for-php5/

Combined with Env.js, you could get pretty close to a full server-side js execution solution.

Upvotes: 1

Joel
Joel

Reputation: 30156

I know you have said no Java, but for reference you might be interested in QT Jaambi. They have an implementation of webkit which yo ucan run in headless mode.

Upvotes: 1

bobince
bobince

Reputation: 536379

You will need:

  • one JavaScript interpreter
  • one DOM Level 2 Core and HTML implementation
  • 500g of non-standard but commonly-used DOM extensions
  • a pinch of DOM Level 2 Style (which might mean also a CSS interpreter and layout engine)
  • yoghurt pots, round-ended scissors and sticky-back plastic

Once you have assembled your components (remember to get a grown-up to help you with the sandboxing), you'll find what you have is essentially indistinguishable from a web browser.

JAVA is not part of the shell build on the server. V8/SquirrelFish is C++ code I would need to convert to PHP.

Porting a JS engine to PHP would be a huge task, and the resulting performance likely horrible. You can't even really get away with a nearly-solution on JavaScript any more, since so many pages are using hideously complex libraries like jQuery to do everything, which will require in-depth JS support.

I don't think you're going to be able to do this purely in PHP. You'll have to hook up Java/Rhino/HTMLUnit or a proper web browser like Mozilla. If your hosting environment doesn't give you the flexibility you need to compile and deploy that sort of thing, you'd have to move to a better hosting setup with a shell (preferably VPS).

If you can avoid this unpleasantness some other way, by special-casing known pages' AJAX access, do that.

Upvotes: 17

Jason Orendorff
Jason Orendorff

Reputation: 45086

Previously asked here: headless internet browser?

At Mozilla we get this question a lot. There's no good answer. What you want is a software library that implements pretty much everything a browser needs to do (at least as far as networking, JavaScript, HTML parsing, and the DOM), but with no display.

The closest thing I know of is HTMLUnit (in Java).

Upvotes: 3

Ben Dunlap
Ben Dunlap

Reputation: 1846

You'll have to go one step further than Rhino if you want to execute real live web pages, because the JavaScript on those pages will expect to be able to use objects that are native to a browser environment. A server-side JavaScript engine like Rhino won't have those objects.

John Resig (creator of jQuery) started a project called Env.js a couple of years ago; it might be what you're looking, for but I suspect you'll have a tough time getting consistent results from a wide variety of web pages. Here's his initial blog post about it:

http://ejohn.org/blog/bringing-the-browser-to-the-server/

Some similar projects are named in that post's comments.

Upvotes: 4

tyranid
tyranid

Reputation: 13318

Tbh you will have a harder time of just using a JS engine as you also have to create the environment of a browser scripting engine such as the DOM and window objects. If you are running on a Windows server then you could fairly easily use the IE COM objects to load and execute the web page, accessing the DOM programatically and pulling the contents back out. As for your server being Linux and/or Mozilla I unfortunately have no experience.

But really what are you trying to do?

Upvotes: 0

Jani Hartikainen
Jani Hartikainen

Reputation: 43243

You can run a JavaScript engine such as Rhino on a server.

Here's a few alternatives:

  • Rhino (Java based)
  • V8 (Used by Chrome, C++)
  • SquirrelFish (C++)

While these can run JS, I'm not sure if what you do is the best approach. However, since you haven't specified the purprose of your program I can't offer any suggestions with that regard.

Upvotes: 4

RageZ
RageZ

Reputation: 27313

you could take a look in rhino. It use java, never heard of a PHP port.

Are you obligated to run the actual javascript?

Upvotes: 0

Related Questions