ltfishie
ltfishie

Reputation: 2987

Caching for file server

I have a java file server that serves file over http. Each file is uniquely addressable by an ID like so:

http://fileserver/id/123455555

I am looking to add a caching layer to this so that the most frequently accessed files stay in memory. I would also like to control the total size of the cache. I am thinking to use ehcache or oscache for this, but I have only used them to cache serialized object before. Would they be a good choice and are there any additional considerations for building a file cache?

Edit

Thanks for all the answers. Some more details to about the file server to simplify (or complicate) the problem:

  1. Once a file is saved, it is never modified.
  2. MD5 hash to avoid duplicating files on save. (I am aware of possible collision and security concerns)
  3. File server running on linux boxes.

Edit 2 Though the server it self does not put any limitation on the file type it supports, Files are mostly images (jpg,gif, pgn), Word, excel, PDF no bigger than 10Mb.

Upvotes: 3

Views: 1168

Answers (4)

Stevie
Stevie

Reputation: 8152

Take advantage of the HTTP protocol

Your most effective caching mechanism by far will be to move caching off your own server and as close to the client as possible (data locality ;)). Use the HTTP protocol effectively to allow clients and caching proxies to do the caching whenever they can appropriately do so:

  • Set ETag's using some function of each file's content (e.g. MD5Sum) - cache this info too, so you don't re-calculate on each serve!
  • Set Expires / Last-Modified / Cache-Control headers as appropriate

edit: You updated to say that the files are never modified, so I would suggest setting the Expires header to a far-future date.

... Now to answer the question more directly ...

EhCache

My experience with EhCache is its a fine choice, and can satisfy the requirements you've mentioned.

You mentioned "the most frequently accessed files stay in memory" so it seems relevant to mention that, according to some performance testing I did (several years ago now) the LFU (Least Frequently Used) eviction policy is a lot slower than LRU (Least Recently Used) on cache writes - something like 30 times slower in fact. This is a product of the additional complexity of LFU vs LRU.

It would be a good idea to check the data usage pattern you really see in production to understand which eviction policy works best for you. In most circumstances I would suggest LRU as a starting point, as it approximates to LFU under conditions where the cache is large enough and there are no significant bursts of unusual data access.

OSCache

I have not used OSCache, so cannot say anything there.

Other considerations

  1. In his answer Peter Lawrey suggested using the OS cache. Whilst this means that you pay a penalty for the read through from java to native I think the idea has great merit since it avoids a significant problem of caching in the Java heap: that the garbage collector has extra work to do trawling the large heap. (An alternative solution to that is to use off-heap caching, for example via BigMemory, but that has its own tradeoffs)
  2. If the content is compressible you probably want to consider caching a compressed (gzip'd) version of the file (otherwise you will end up re-compressing it every time it is served!). This is one argument that goes against using the OS disk cache. Of course there are other caveats that go with compression (e.g. content is large enough to warrant compressing and compresses reasonably well) so it really does depend on what is in those files.

Upvotes: 1

Peter Lawrey
Peter Lawrey

Reputation: 533500

IMHO, you are better of making use of the OS disk cache as this has several advantages.

  • Its much simpler as the OS does all the real work.
  • The os can use all the available free memory which can vary depending on what else the system does.
  • You don't double up with the disk cache (as it is the disk cache).

The OS will keeps all the least recently used files in memory anyway.

Upvotes: 1

Chandra
Chandra

Reputation: 333

Ehcache provide ability to do web caching as well . You may want to try that http://www.ehcache.org/documentation/user-guide/web-caching

Upvotes: 1

ollins
ollins

Reputation: 1849

guava cache? http://code.google.com/p/guava-libraries/wiki/CachesExplained

  • nice API
  • time based eviction
  • size based eviction

Upvotes: 2

Related Questions