Reputation: 7326
I am working on Concurrent File Download process, but not sure what approach to take.
About:
An application bundles bunch of files together into a zip file. The files are usually available on the hard drive in a common location (for example /tmp). However there are cases when files are not there and need to be downloaded from a remote http server.
Question:
How can I download multiple files concurrently and ensure that NO other thread (bundling files) downloads the same file at the same time?
More over, how can I ensure that in case of multiple applications running at the same time (remember that the files are all located in a common location), no instance of the application downloads the same file at the same time?
Please describe strategy and perhaps a way to implement it. Perhaps solution the above issue already exists.
Thank you!
Upvotes: 1
Views: 1117
Reputation: 20475
You could use a queue or db to download needed files, just keep a 'status' column and a thread will mark the file as 'fetching'. When done it will set as 'done'. Keep a last change timestamp and if the file is downloading for a long time, stop or restart download.
Using a database for this file queue might ensure that other apps don't fetch the same file multiple times (maybe persist download etc;). Also you can have multiple downloads running and the db could be used to track download speed, progress, etc;
In the future your question should be formatted with specific code, a specific problem. Your question is very broad and presents a discussion (better suited for chat) vs a single answer someone else might use.
Upvotes: 1
Reputation: 39457
Here is a possible strategy:
In case of a single app: have some sort of dispatcher thread which reads work from a queue (could be some persisted queue too like DB table or other) and spawns new threads for each item that was read from the queue. By read I mean, read and remove from the queue.
Have that queue stored in a shared DB (or any shared storage). In this case there may be a separate single dispatcher app which just reads works or work portions from the DB, and gives work to worker apps. So each worker app asks the dispatcher app for work, this ensures that only the dispatcher app reads from the DB (or the other central storage you decide to use). This on its turn eliminates the need to sync your DB (permanent storage) access.
Upvotes: 1