Bob Herrmann
Bob Herrmann

Reputation: 9908

uploaded files - database vs filesystem, when using Grails and MySQL

I know this is something of a "classic question", but does the mysql/grails (deployed on Tomcat) put a new spin on considering how to approach storage of user's uploaded files.

I like using the database for everything (simpler architecture, scaling is just scaling the database). But using the filesystem means we don't lard up mysql with binary files. Some might also argue that apache (httpd) is faster than Tomcat for serving up binary files, although I've seen numbers that actually show just putting Tomcat on the front of your site can be faster than using an apache (httpd) proxy.

How should I choose where to place user's uploaded files?

Thanks for your consideration, time and thought.

Upvotes: 3

Views: 4093

Answers (4)

Pramod
Pramod

Reputation: 806

Even if you upload file in filesystem, all the files get same permission, so any logged in user can access any other's file just entering the url (Since all of them get same permission). If you however plan to give each user a directory then a user permission of apache (that is what server has permission) is given to them. You should su to root, create a user and upload files to those directories. Again accessing those files could end up adding user's group to server group. If I choose to use filesystem to store binary files, is there an easier solution than this, how do you manage access to those files, corresponding to each user, and maintaining the permission? Does Spring's ACL help? Or do we have to create permission group for each user? I am totally cool with the filesystem url. My only concern is with starting a seperate process (chmod and stuff), using something like ProcessBuilder to run Operating Systems commands (or is there better solution ?). And what about permissions?

Upvotes: 0

Karsten Silz
Karsten Silz

Reputation: 1076

Another thing to keep in mind is that if your site ever grows beyond one application server, you need to access the same files from all app servers. Now all app servers have access to the database, either because that's a single server or because you have a cluster. Now if you store things in the file system, you have to share that, too - maybe NFS.

Upvotes: 0

Siegfried Puchbauer
Siegfried Puchbauer

Reputation: 6539

Just as an additional suggestion: JCR (eg. Jackrabbit) - a Java Content Repository. It has several benefits when you deal with a lot of binary content. The Grails plugin isn't stable yet, but you can use Jackrabbit with the plain API.

Upvotes: 3

j pimmel
j pimmel

Reputation: 11637

I don't know if one can make general observations about this kind of decision, since it's really down to what you are trying to do and how high up the priority list NFRs like performance and response time are to your application.

If you have lots of users, uploading lots of binary files, with a system serving large numbers of those uploaded binary files then you have a situation where the costs of storing files in the database include:

  • Large size binary files
  • Costly queries

Benefits are

  • Atomic commits
  • Scaling comes with database (though w MySQL there are some issues w multinode etc)
  • Less fiddly and complicated code to manage file systems etc

Given the same user situation where you store to the filesystem you will need to address

  • Scaling
  • File name management (user uploads same name file twice etc)
  • Creating corresponding records in DB to map to the files on disk (and the code surrounding all that)
  • Looking after your apache configs so they serve from the filesystem

We had a similar problem to solve as this for our Grails site where the content editors are uploading hundreds of pictures a day. We knew that driving all that demand through the application when it could be better used doing other processing was wasteful (given that the expected demand for pages was going to be in the millions per week we definitely didn't want images to cripple us).

We ended up creating upload -> file system solution. For each uploaded file a DB meta-data record was created and managed in tandem with the upload process (and conversely read that record when generating the GSP content link to the image). We served requests off disk through Apache directly based on the link requested by the browser. But, and there is always a but, remember that with things like filesystems you only have content per machine.

We had the headache of making sure images got re-synchronised onto every server, since unlike a DB which sits behind the cluster and enables the cluster behave uniformly, files are bound to physical locations on a server.

Another problem you might run up against with filesystems is folder content size. When you start having folders where there are literally tens of thousands of files in them, the folder scan at the OS level starts to really drag. To avert this problem we had to write code which managed image uploads into yyyy/MM/dd/image.name.jpg folder structures, so that no one folder accumulated hundreds of thousands of images.

What I'm implying is that while we got the performance we wanted by not using the DB for BLOB storage, that comes at the cost of development overhead and systems management.

Upvotes: 5

Related Questions