Bill Zimmerman
Bill Zimmerman

Reputation: 13

Caching system for dynamically created files?

I have a web server that is dynamically creating various reports in several formats (pdf and doc files). The files require a fair amount of CPU to generate, and it is fairly common to have situations where two people are creating the same report with the same input.

Inputs:

When a user attempts to generate a report, I would like to check to see if a file already exists with the given input, and if so return a link to the file. If the file doesn't already exist, then I would like to generate it as needed.

  1. What solutions are already out there? I've cached simple http requests before, but the keys were extremely simple (usually database id's)

  2. If I have to do this myself, what is the best way. The input can be several hundred words, and I was wondering how I should go about transforming the strings into keys sent to the cache.

    //entire input, uses too much memory, one to one mapping cache['one two three four five six seven eight nine ten eleven...'] //short keys cache['one two'] => 5 results, then I must narrow these down even more

  3. Is this something that should be done in a database, or is it better done within the web app code (python in my case)

Thanks you everyone.

Upvotes: 1

Views: 158

Answers (2)

S.Lott
S.Lott

Reputation: 391952

This is what Apache is for.

Create a directory that will have the reports.

Configure Apache to serve files from that directory.

If the report exists, redirect to a URL that Apache will serve.

Otherwise, the report doesn't exist, so create it. Then redirect to a URL that Apache will serve.


There's no "hashing". You have a key ("a string (equations, numbers, and lists of words), arbitrary length, almost 99% will be less than about 200 words") and a value, which is a file. Don't waste time on a hash. You just have a long key.

You can compress this key somewhat by making a "slug" out of it: remove punctuation, replace spaces with _, that kind of thing.

You should create an internal surrogate key which is a simple integer.

You're simply translating a long key to a "report" which either exists as a file or will be created as a file.

Upvotes: 2

John La Rooy
John La Rooy

Reputation: 304375

The usual thing is to use a reverse proxy like Squid or Varnish

Upvotes: 1

Related Questions