Reputation: 53843
I've got a proxy written in Django which receives requests for certain files. After deciding whether the user is allowed to see the file the proxy gets the file from a remote service and serves it to the user. There's a bit more to it but this is the gist.
This setup works great for single files, but there is a new requirement that the users want to download multiple files together as a zip. The files are sometimes small, but can also become really large (100MB plus) and it can be anywhere from 2 up to 1000 files simultaneously. This can become really large, and a burden to first get all those files, zip them and then serve them in the same request.
I read about the possibility to create "streaming zips"; a way to open a zip and then start sending the files in that zip until you close it. I found a couple php examples and in Python the django-zip-stream extension. They all assume locally stored files and the django extension also assumes the usages of nginx.
There are a couple things I wonder about in my situation:
Does anybody know whether streaming zips are a good idea with my setup of very large remote files? I'm a bit afraid that many requests will easily DOS our servers because of CPU or memory limits.
I can also build a queue which zips the files and sends an email to the user, but if possible I'd like to keep the application as stateless as possible.
Upvotes: 1
Views: 599
Reputation: 5849
This sounds to me like a perfect use case to be solved queueing jobs and processing them in the background.
Advantages:
The second advantage is particularly desirable since you’re prepared to receive multiple concurrent requests.
I would also consider using a “task” Django model with a FileField to be used as a container for the resulting zip file, so it will be statically and efficiently served by Nginx from the media folder. As an additional benefit, you will monitor what’s going on directly from he Django admin user interface.
I’ve used a similar approach in many Django project, and that has proven to be quite robust and manageable; you might want to take a quick look at the following django app I’m using for that: https://github.com/morlandi/django-task
To summarize:
Upvotes: 5
Reputation: 104
Ok this is tough one!
After the first request you could create and save the zipped file on the file servers. So the File Servers always deliver zipped files at the end. First time request will take longer because of creating the zip file but next times it will always deliver the zipped file as long as it will not be deleted.
a) You could deliver a single stream which could be at the end a tape archive aka tar file which includes all the zipped files.
-- or --
In case of an DOS Attack you could limit the amount of requests for file downloads. So if there are too many requests at the same time they will be bounced back and they have to try it later on.
Upvotes: 0