Reputation: 6376
Requirement is to keep a copy of complete web page at server side same as it is rendered on client browser as past records.These records are revisited.
We are trying to store the html of rendered web page. The html is then rendered using resources like javascript, css and image present at server side. These resources keep on changing. Therefore old records are no longer rendered perfectly.
Is there any other way to solve above? We are also thinking converting it into pdf using IText or apache FOP api but they does not consider javascript effect on page while conversion. Is there any APIs available in java to achieve this?
Till now, no approach working perfectly. Please suggest.
Edit: In summary,requirement is to create a exact copy of rendered web page at server side to store user activities on that page.
Upvotes: 2
Views: 1774
Reputation: 6708
If you're storing the html page, why not the references to the js, css, and images too?
I don't know what your implementation is now, but you should create a filesystem with all of the html pages and resources, and create references to the locations in a db. You should be backing up the resources in the filesystem every time you change them!
I use this implementation for an image archive. When a client passes us the url of an image we want to be able to go back and check out exactly what the image was at that time they sent it (since it's a url it can change at any time). I have a script that will download the image as soon as we receive the url, store it in the filesystem, and then store the path to the file in the db along with other various details. This is similar to what you need, just a couple more rows in your table for the js, css, images paths.
Upvotes: 0
Reputation: 13374
Depending on just how sophisticated your javascript is, and depending on how faithfully you want to capture what the client saw, you may be undertaking an impossible task.
At a high level, you have the following options:
You can do #1 using JSP filters etc, but it doesn't address issues like the javascript fetching dynamic html content during rendering on the client.
Getting the client to return what they are seeing (#2) is tricky, and bandwidth intensive.
So I would opt for #3. In order to turn a website that renders dynamic content versioned, you have to do several things. First, all datasources need to versioned too. So any queries would need to specify the version. "Version" can be a timestamp or some generation counter that you maintain. If you are taking this approach, you would also need to ensure that any javascript you feed to the client does not fetch external resources directly. Rather, it should ask for any resources from your system. Your system would in turn fetch the external content (or reuse from a cache).
Upvotes: 1
Reputation: 22867
A very resource-consuming requirement but...
You haven't written what application server you are using and what framework. If you're generating responces in your own code, you can just store it while generating.
Another possibility is to write a filter, that would wrap servlet's OutputStream and log everything that was written to it, you must just assure your filter is on the top of the hierarchy.
Another, very powerfull, easiest to manage and generic solution, however possibly the most resource-consuming: write transparent proxy server staying between user and application server, that would redirect each call to app server and return exact response, additionally saving each request and response.
Upvotes: 0
Reputation: 136
wkhtmltopdf should do this quite nicely for you. It will take a URL, and return a pdf.
Example:
wkhtmltopdf http://www.google.com google.pdf
Upvotes: 1
Reputation: 7631
The answer would depend on the server technology being used to write the HTML. Are you using Java/JSPs or Servlets or some sort of an HTTPResponse object to push the HTML/data to the browser?
If only the CSS/JS/HTML are changing, why don't you just take snapshots of your client-side codebase and store them as website versions?
If other data is involved (like XML/JSON) take a snapshot of those and version that as well. Then the snapshot of the client codebase as mentioned above with the contemporary snapshot of the data should together give you the exact rendering of your website as at that point of time.
Upvotes: 0