Reputation: 9908
I know this is something of a "classic question", but does the mysql/grails (deployed on Tomcat) put a new spin on considering how to approach storage of user's uploaded files.
I like using the database for everything (simpler architecture, scaling is just scaling the database). But using the filesystem means we don't lard up mysql with binary files. Some might also argue that apache (httpd) is faster than Tomcat for serving up binary files, although I've seen numbers that actually show just putting Tomcat on the front of your site can be faster than using an apache (httpd) proxy.
How should I choose where to place user's uploaded files?
Thanks for your consideration, time and thought.
Upvotes: 3
Views: 4093
Reputation: 806
Even if you upload file in filesystem, all the files get same permission, so any logged in user can access any other's file just entering the url (Since all of them get same permission). If you however plan to give each user a directory then a user permission of apache (that is what server has permission) is given to them. You should su to root, create a user and upload files to those directories. Again accessing those files could end up adding user's group to server group. If I choose to use filesystem to store binary files, is there an easier solution than this, how do you manage access to those files, corresponding to each user, and maintaining the permission? Does Spring's ACL help? Or do we have to create permission group for each user? I am totally cool with the filesystem url. My only concern is with starting a seperate process (chmod and stuff), using something like ProcessBuilder to run Operating Systems commands (or is there better solution ?). And what about permissions?
Upvotes: 0
Reputation: 1076
Another thing to keep in mind is that if your site ever grows beyond one application server, you need to access the same files from all app servers. Now all app servers have access to the database, either because that's a single server or because you have a cluster. Now if you store things in the file system, you have to share that, too - maybe NFS.
Upvotes: 0
Reputation: 6539
Just as an additional suggestion: JCR (eg. Jackrabbit) - a Java Content Repository. It has several benefits when you deal with a lot of binary content. The Grails plugin isn't stable yet, but you can use Jackrabbit with the plain API.
Upvotes: 3
Reputation: 11637
I don't know if one can make general observations about this kind of decision, since it's really down to what you are trying to do and how high up the priority list NFRs like performance and response time are to your application.
If you have lots of users, uploading lots of binary files, with a system serving large numbers of those uploaded binary files then you have a situation where the costs of storing files in the database include:
Benefits are
Given the same user situation where you store to the filesystem you will need to address
We had a similar problem to solve as this for our Grails site where the content editors are uploading hundreds of pictures a day. We knew that driving all that demand through the application when it could be better used doing other processing was wasteful (given that the expected demand for pages was going to be in the millions per week we definitely didn't want images to cripple us).
We ended up creating upload -> file system solution. For each uploaded file a DB meta-data record was created and managed in tandem with the upload process (and conversely read that record when generating the GSP content link to the image). We served requests off disk through Apache directly based on the link requested by the browser. But, and there is always a but, remember that with things like filesystems you only have content per machine.
We had the headache of making sure images got re-synchronised onto every server, since unlike a DB which sits behind the cluster and enables the cluster behave uniformly, files are bound to physical locations on a server.
Another problem you might run up against with filesystems is folder content size. When you start having folders where there are literally tens of thousands of files in them, the folder scan at the OS level starts to really drag. To avert this problem we had to write code which managed image uploads into yyyy/MM/dd/image.name.jpg folder structures, so that no one folder accumulated hundreds of thousands of images.
What I'm implying is that while we got the performance we wanted by not using the DB for BLOB storage, that comes at the cost of development overhead and systems management.
Upvotes: 5