Abhishek Jain
Abhishek Jain

Reputation: 6376

How to store a copy of complete web page at server side as soon as it is rendered on client browser?

Requirement is to keep a copy of complete web page at server side same as it is rendered on client browser as past records.These records are revisited.

We are trying to store the html of rendered web page. The html is then rendered using resources like javascript, css and image present at server side. These resources keep on changing. Therefore old records are no longer rendered perfectly.

Is there any other way to solve above? We are also thinking converting it into pdf using IText or apache FOP api but they does not consider javascript effect on page while conversion. Is there any APIs available in java to achieve this?

Till now, no approach working perfectly. Please suggest.

Edit: In summary,requirement is to create a exact copy of rendered web page at server side to store user activities on that page.

Upvotes: 2

Views: 1774

Answers (5)

Matt K
Matt K

Reputation: 6708

If you're storing the html page, why not the references to the js, css, and images too?

I don't know what your implementation is now, but you should create a filesystem with all of the html pages and resources, and create references to the locations in a db. You should be backing up the resources in the filesystem every time you change them!

I use this implementation for an image archive. When a client passes us the url of an image we want to be able to go back and check out exactly what the image was at that time they sent it (since it's a url it can change at any time). I have a script that will download the image as soon as we receive the url, store it in the filesystem, and then store the path to the file in the db along with other various details. This is similar to what you need, just a couple more rows in your table for the js, css, images paths.

Upvotes: 0

Dilum Ranatunga
Dilum Ranatunga

Reputation: 13374

Depending on just how sophisticated your javascript is, and depending on how faithfully you want to capture what the client saw, you may be undertaking an impossible task.

At a high level, you have the following options:

  1. Keep a copy of everything you send to the client
  2. Get the client to return back exactly whatever it has rendered
  3. Build your system in such a way that you can actually fetch all historical versions of the constituent resources if/when you need to reproduce a browser's view.

You can do #1 using JSP filters etc, but it doesn't address issues like the javascript fetching dynamic html content during rendering on the client.

Getting the client to return what they are seeing (#2) is tricky, and bandwidth intensive.

So I would opt for #3. In order to turn a website that renders dynamic content versioned, you have to do several things. First, all datasources need to versioned too. So any queries would need to specify the version. "Version" can be a timestamp or some generation counter that you maintain. If you are taking this approach, you would also need to ensure that any javascript you feed to the client does not fetch external resources directly. Rather, it should ask for any resources from your system. Your system would in turn fetch the external content (or reuse from a cache).

Upvotes: 1

Cjxcz Odjcayrwl
Cjxcz Odjcayrwl

Reputation: 22867

A very resource-consuming requirement but...

You haven't written what application server you are using and what framework. If you're generating responces in your own code, you can just store it while generating.

Another possibility is to write a filter, that would wrap servlet's OutputStream and log everything that was written to it, you must just assure your filter is on the top of the hierarchy.

Another, very powerfull, easiest to manage and generic solution, however possibly the most resource-consuming: write transparent proxy server staying between user and application server, that would redirect each call to app server and return exact response, additionally saving each request and response.

Upvotes: 0

Hoppy
Hoppy

Reputation: 136

wkhtmltopdf should do this quite nicely for you. It will take a URL, and return a pdf.

code.google.com/p/wkhtmltopdf

Example:

wkhtmltopdf http://www.google.com google.pdf

Upvotes: 1

Sid
Sid

Reputation: 7631

The answer would depend on the server technology being used to write the HTML. Are you using Java/JSPs or Servlets or some sort of an HTTPResponse object to push the HTML/data to the browser?

If only the CSS/JS/HTML are changing, why don't you just take snapshots of your client-side codebase and store them as website versions?

If other data is involved (like XML/JSON) take a snapshot of those and version that as well. Then the snapshot of the client codebase as mentioned above with the contemporary snapshot of the data should together give you the exact rendering of your website as at that point of time.

Upvotes: 0

Related Questions